feat(services): migrate page authoring (PR 7/N)#2109
feat(services): migrate page authoring (PR 7/N)#2109mxkaske merged 5 commits intofeat/services-connect-backfillfrom
Conversation
Seventh domain. Migrates the 13 authoring procedures in `pageRouter` onto `@openstatus/services/page`. Deliberately scoped to authoring CRUD only: - `statusPage.ts` — public viewer endpoints (subscribe / get / uptime / report / verify / unsubscribe) are a separate surface that doesn't use the authenticated `ServiceContext`; dedicated follow-up. - Connect `apps/server/src/routes/rpc/services/status-page/**` — ~1500 lines with 18 methods (page CRUD + components + groups + subscribers + view). Too big for this PR; dedicated follow-up, same shape as the Connect monitor deferral. ## Services (`packages/services/src/page/`) - `createPage` / `newPage` — full vs minimal create; both enforce the `status-pages` plan cap and (for `createPage`) the per-access-type plan gates (password-protection, email-domain-protection, ip- restriction, no-index). - `deletePage` — FK cascade clears components / groups / reports / subscribers. - `listPages` — batched enrichment with `statusReports`. - `getPage` — enriched with `maintenances` / `pageComponents` / `pageComponentGroups`. - `getSlugAvailable` — pure check against `subdomainSafeList` + DB. - `updatePageGeneral` — with slug-uniqueness re-check on change. - `updatePageCustomDomain` — persists the DB change and returns the previous domain so the caller can diff. Vercel add/remove stays at the tRPC layer (external integration). - `updatePagePasswordProtection` — re-applies the same plan gates the `create` path uses. - `updatePageAppearance`, `updatePageLinks`, `updatePageLocales` (gated on `i18n` plan flag), `updatePageConfiguration`. - Audit action emitted for every mutation. ## tRPC (`packages/api/src/router/page.ts`) All 13 procedures are thin wrappers. `delete` catches `NotFoundError` for idempotency. `updateCustomDomain` orchestrates: 1. `getPage` (via service) to read the existing domain. 2. `addDomainToVercel` / `removeDomainFromVercel` as needed. 3. `updatePageCustomDomain` (via service) to persist. ## Enforcement - Biome scope adds `packages/api/src/router/page.ts`. The router imports `insertPageSchema` via the services re-export (`CreatePageInput`) so the db-import ban applies cleanly. - Subpath export `@openstatus/services/page`. ## Tests - `__tests__/page.test.ts` covers `newPage` happy / reserved / duplicate, `createPage` monitor attachment + cross-workspace monitor, `updatePageGeneral` rename + duplicate-slug conflict + cross-workspace, `updatePageLocales` plan gate, list / get / slug-available workspace isolation, delete cross-workspace NotFoundError. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
There was a problem hiding this comment.
4 issues found across 12 files
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="packages/services/src/page/schemas.ts">
<violation number="1" location="packages/services/src/page/schemas.ts:91">
P2: `allowedIpRanges` accepts arbitrary strings; restore CIDR validation in the service schema to prevent persisting invalid IP restriction rules.</violation>
</file>
<file name="packages/services/src/page/update.ts">
<violation number="1" location="packages/services/src/page/update.ts:130">
P2: `authEmailDomains` cannot be cleared because null input is converted to `undefined` and skipped in the DB update.</violation>
</file>
<file name="packages/services/src/page/create.ts">
<violation number="1" location="packages/services/src/page/create.ts:24">
P1: `createPage` skips `assertSlugAvailable`, so full-form creates can bypass reserved/duplicate slug validation before insert.</violation>
<violation number="2" location="packages/services/src/page/create.ts:28">
P1: `createPage` does not pass `allowIndex` into `assertAccessTypeAllowed`, bypassing the `no-index` plan gate on create.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
Extends PR #2109 to cover the Connect RPC status-page handler's page CRUD surface (create / get / list / update / delete), matching the migration that landed for tRPC's `pageRouter`. The other 13 methods (components, groups, subscribers, viewer) still read the db directly — they're separate domains that'll need their own services in follow-ups. - create / get / delete call into `@openstatus/services/page` and preserve the granular Connect errors (`statusPageNotFoundError`, `slugAlreadyExistsError`) by pre-checking before the service call or catching `NotFoundError` → re-throwing the richer variant. - list fetches via the service and paginates in-memory; status-page quota is bounded per workspace so the extra enrichment is negligible. - update loads the existing page via the service, then orchestrates the per-section updates (`updatePageGeneral`, `updatePageLinks`, `updatePageAppearance`, `updatePageCustomDomain`, `updatePageLocales`, `updatePagePasswordProtection`) inside a shared transaction so a partial failure can't leave the page half-updated. Each service's internal `withTransaction` detects the pre-opened tx and skips nesting. - Proto-specific format validations (https icon URL, custom-domain regex, IPv4 CIDR, email-domain shape) and the i18n PermissionDenied path stay at the handler — they don't exist in the zod insert schema and their error codes would change if deferred to the service. - `Page` from the service parses `authEmailDomains` / `allowedIpRanges` into arrays, while the converters (still used by the unmigrated methods) expect the comma-joined string form. `serviceToConverterPage` bridges the two shapes at the call sites that need it. Biome scope deliberately unchanged: the file still imports from `@openstatus/db` for the 13 legacy methods, so the override would light up the whole file. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
1 issue found across 1 file (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="apps/server/src/routes/rpc/services/status-page/index.ts">
<violation number="1" location="apps/server/src/routes/rpc/services/status-page/index.ts:517">
P1: Stale `authEmailDomains` not cleared when switching access type. Setting `nextAuthEmailDomains = undefined` doesn't clear the DB column because the service's `.set({ authEmailDomains: undefined?.join(",") })` evaluates to `undefined`, which Drizzle ignores. The corresponding `allowedIpRanges` line in the service has `?? null` but `authEmailDomains` does not — the service line `authEmailDomains: input.authEmailDomains?.join(",")` in `packages/services/src/page/update.ts` needs a matching `?? null` fallback to clear the column.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
Four issues flagged across two Cubic reviews:
- `createPage` skipped `assertSlugAvailable`, so full-form creates
could bypass reserved/duplicate slug validation and either create a
duplicate or fail late on a DB constraint instead of the clean
`ConflictError`. Added the check alongside the existing quota gate.
- `createPage` passed `passwordProtected` / `allowedIpRanges` but not
`allowIndex` to `assertAccessTypeAllowed`, bypassing the `no-index`
plan gate on create. Now forwarded.
- `UpdatePagePasswordProtectionInput.allowedIpRanges` accepted arbitrary
strings. Mirrored the CIDR validation from `insertPageSchema` — bare
IPs get `/32` appended, everything pipes through `z.cidrv4()`.
- `updatePagePasswordProtection` wrote `authEmailDomains:
input.authEmailDomains?.join(",")`, which evaluates to `undefined`
when the caller clears the field. Drizzle treats `undefined` as
"skip this column" on `.set()`, so stale email domains survived an
access-type switch. Added the `?? null` fallback to match the
neighboring `allowedIpRanges` line. This fixes the Connect
`updateStatusPage` path where switching away from AUTHENTICATED sets
`nextAuthEmailDomains = undefined` expecting the column to clear.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@claude review |
|
Claude finished @mxkaske's task in 4m 19s —— View job Review
|
| Issue | Status |
|---|---|
createPage skips assertSlugAvailable |
✅ Fixed |
createPage skips allowIndex in assertAccessTypeAllowed |
✅ Fixed |
allowedIpRanges accepts arbitrary strings in schema |
✅ Fixed (CIDR pipeline added) |
authEmailDomains can't be cleared (undefined → Drizzle skip) |
✅ Fixed (?? null at update.ts:134) |
Connect nextAuthEmailDomains = undefined doesn't clear DB |
✅ Fixed (same ?? null in service handles this) |
Observations
P2: updateCustomDomain in tRPC does a wasteful getPage read (page.ts:222)
getPage is called solely to obtain current.customDomain, but it fires 3 parallel DB queries (maintenances + pageComponents + pageComponentGroups). The service's updatePageCustomDomain already reads the page internally (getPageInWorkspace), so the tRPC path makes 4 DB round-trips when 1–2 suffice. The service was designed to return existingDomain to help with this — but that return value can't be used here because the Vercel calls need the old domain before the update (so DB and Vercel stay in sync if Vercel throws). The simplest fix is to add a thin getCustomDomain(id) helper in the service, or simply inline getPageInWorkspace at the transport layer for this one field. Not a functional bug, but a measurable overhead on every custom-domain update.
P2: Slug-uniqueness TOCTOU in updateStatusPage (index.ts:492-498)
The handler pre-checks slug with getPageBySlug to surface slugAlreadyExistsError (Code.AlreadyExists + metadata). Then updatePageGeneral re-validates internally via assertSlugAvailable. In the race where two concurrent callers both clear the pre-check simultaneously, the loser of the DB race gets the service's ConflictError, which toConnectError maps to Code.InvalidArgument — not Code.AlreadyExists. Extremely low probability, but callers that key on the gRPC code for retry/UX logic would see an inconsistent code. The handler could catch ConflictError from updatePageGeneral and rethrow as slugAlreadyExistsError to eliminate the gap. Fix this →
P2: Connect API writes customDomain to DB without syncing Vercel
Both createStatusPage and updateStatusPage accept and persist customDomain — but neither calls the Vercel add/remove API. Confirmed this predates this PR (the old direct-DB handler had the same gap), so it's not a regression. Still worth calling out: Connect clients setting a custom domain via gRPC will get a DB row that says the domain is set, but routing won't actually work until a tRPC/dashboard round-trip happens. Worth a comment in the handler and/or a follow-up issue so it doesn't go unnoticed.
P3: Deprecated passwordProtected boolean not updated in updatePagePasswordProtection (update.ts:127-140)
The column is marked @deprecated in the schema, and the v1 REST read path already derives it from accessType (via normalizePasswordProtected in apps/server/src/routes/v1/pages/schema.ts:131), so in practice nothing breaks. Just confirming this was a deliberate omission rather than an oversight.
P3: void ConflictError workaround (create.ts:142)
ConflictError is imported in create.ts but never directly referenced there (it's used in internal.ts). The void ConflictError; trick silences the TS "unused import" warning. Better to just remove the import — if the tsconfig complains about something else, that's a separate issue to address at the tsconfig level rather than with a runtime no-op.
P3: Double cast as unknown as Page (create.ts:90, create.ts:137)
return row as unknown as Page indicates the drizzle inferred type and the Page type aren't structurally aligned — a double cast is needed to suppress TS. The Page type is presumably correct at runtime, but this is a sign the type alias is either looser or stricter than what drizzle returns. If there's a mismatch (e.g. allowedIpRanges as string in drizzle vs string[] in Page), callers relying on the typed shape could hit surprises.
Test coverage gaps
The test file covers: newPage, createPage, updatePageGeneral, updatePageLocales, list/get/getSlugAvailable, deletePage. The following service functions have zero integration tests:
updatePagePasswordProtection— most complex update function; CIDR validation,authEmailDomainsclearing, plan gatingupdatePageCustomDomain— returnsexistingDomain; worth verifying the round-tripupdatePageAppearance— config merge logicupdatePageLinks— simple but untestedupdatePageConfiguration— config merge logic
Not blocking for this migration PR, but worth tracking. The access-type clearing fix in particular (authEmailDomains: ... ?? null) would benefit from a regression test. Fix this →
What's well done
- Layering is clean: services have zero external-API calls; Vercel, analytics, and external integrations stay at the transport layer.
-
Transaction wrapping in
updateStatusPageis the right call — per-section service calls each re-use the outertxviactx.db, so partial failures on multi-section updates are properly rolled back. -
authEmailDomains/allowedIpRangesclearing with?? nullis correct and the comment inupdate.ts:130-133explains the intent well. -
assertSlugAvailablecase-insensitive comparison (sql\lower(${page.slug}) = ${slug}``) correctly handles slug collisions across mixed-case existing data. -
CIDR pipeline in
UpdatePagePasswordProtectionInput(bare IP →/32append →z.cidrv4()) is clean and reusable. -
serviceToConverterPagebridge is a pragmatic shim that keeps the 13 still-on-DB methods working without touching their converters.
…/N) (#2110) * feat(services): migrate workspace / user / invitation / api-key (PR 8/N) Stacked on PR #2109. Eighth migration — four small domains consolidated into one PR because each is narrow (roughly two to five procedures) and they share no structural dependencies beyond already-migrated infrastructure. ### Services (`packages/services/src/{workspace,user,invitation,api-key}/`) **workspace** — `getWorkspace`, `getWorkspaceWithUsage` (pages + monitors + notifications + page-components batched via drizzle relations), `listWorkspaces` (takes `userId` explicitly since `list` runs across every workspace the user has access to), `updateWorkspaceName`. **user** — `getUser` (active, non-soft-deleted), `deleteAccount` (the paid- plan guardrail stays; removes non-owned memberships, sessions, OAuth accounts and blanks the PII columns inside a single tx). **invitation** — `createInvitation` (plan gate counts pending invites against the members cap so two outstanding invites can't both accept past the limit), `deleteInvitation`, `listInvitations`, `getInvitationByToken` (scoped by token **and** accepting email to prevent token-sharing), `acceptInvitation` (stamps acceptedAt + inserts membership atomically). **api-key** — `createApiKey` (returns plaintext token once), `revokeApiKey` (workspace-scoped existence check inside the tx so concurrent revokes resolve to a consistent NotFound rather than a silent no-op), `listApiKeys` (replaces the legacy per-row `Promise.all` fan-out with a single IN query for creator enrichment), `verifyApiKey` + `updateApiKeyLastUsed` (no ctx required — the verify path runs before workspace resolution and callers pass an optional `db` override). ### tRPC (`packages/api/src/router/{workspace,user,invitation,apiKey}.ts`) All 14 procedures become thin `try { return await serviceFn(...) } catch { toTRPCError }` wrappers. Router shapes stay identical so the dashboard needs no changes. Connect + Slack don't expose these domains today; migrating their consumers is a follow-up. ### Enforcement Biome `noRestrictedImports` override adds the four router files. Subpath exports `@openstatus/services/{workspace,user,invitation,api-key}` added to the services package. ### Cleanup Deletes `packages/api/src/service/apiKey.ts` and its tests — fully superseded by `packages/services/src/api-key/`. The auth middleware in `apps/server` has its own inline apiKey verification and is unaffected. ### Deliberately out of scope - **`domain.ts`** — pure Vercel-API proxy with no DB usage; not part of the migration surface. Stays as-is. - **`packages/api/src/service/{import,telegram-updates}.ts`** — import migration is PR 9; telegram-updates stays for a follow-up. ### Tests Per-domain `__tests__/*.test.ts` covers: workspace rename + audit, usage counts, members cap hit on free plan, invitation token-mismatch rejection, accept idempotency, api-key creation returning a bcrypt hash, list creator enrichment, revoke NotFoundError on unknown ids, verifyApiKey happy / bad- format / wrong-body paths, lastUsed debounce. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(services): address Cubic review on #2110 Four issues flagged on PR 8: - **P1 — `invitation/accept.ts`**: the read-then-write pattern let two concurrent accepts both pass the `isNull(acceptedAt)` check and race through the membership insert. Replaced with a conditional UPDATE that re-asserts `isNull(acceptedAt)` in the WHERE clause and checks `.returning()` rowcount. The loser gets `ConflictError`, the tx aborts before membership inserts run. - **P2 — `api-key/create.ts`**: `createdById` was taken from input and the router spliced in `ctx.user.id`. Since that column is attribution data (who owns the key, who the audit row blames), trusting input would let any caller forge ownership. Derived from `ctx.actor` via `tryGetActorUserId`; actors without a resolvable user id (system / webhook / unlinked api-key) now get `UnauthorizedError` instead of a silent NULL write. `createdById` removed from the input schema. - **P2 — `invitation/delete.ts`**: audit row was emitted even when the DELETE matched zero rows (unknown id / wrong workspace). Switched to `.returning({ id })` and short-circuit before the audit emit so the log only reflects actual deletions. - **P2 — `invitation/list.ts`**: the `if (!input.email)` → `UnauthorizedError` branch in `getInvitationByToken` was unreachable because `z.email()` already rejects empty / malformed emails at `.parse()`. Removed the dead branch; the router keeps its own pre-call check for `ctx.user.email`, so the transport-level UnauthorizedError path is preserved. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(services): migrate import domain (PR 9/N) (#2111) * feat(services): migrate import domain (PR 9/N) Stacked on PR #2110. Ninth and final domain — lifts the ~1,000-line `packages/api/src/service/import.ts` orchestrator into the services package as its own `@openstatus/services/import` domain. ### Services (`packages/services/src/import/`) Split into focused files: - **`schemas.ts`** — `PreviewImportInput` / `RunImportInput` zod. Provider discriminator + per-provider page-id fields live here; options schema is separately exported for callers that want to pre-validate. - **`provider.ts`** — `createProvider` factory + `buildProviderConfig` reshape helper, isolated from the orchestrator so adding a provider is a one-file change. - **`limits.ts`** — `addLimitWarnings` (shared by preview + run). Pure mutation on the `ImportSummary` argument; no writes. - **`utils.ts`** — `clampPeriodicity` + `computePhaseStatus` helpers. - **`phase-writers.ts`** — the seven phase writers (page / component groups / components / incidents / maintenances / monitors / subscribers). Each takes a `DB` explicitly so callers can thread a pre-opened tx; failing resources get `status: "failed"` with an error string rather than throwing. - **`preview.ts`** — dry-run only; validates credentials, runs the provider with `dryRun: true`, emits warnings. - **`run.ts`** — the orchestrator. Now owns the `pageId` ownership check (previously duplicated in the tRPC router) and emits exactly **one** `import.run` audit row regardless of outcome so partial / failed runs still show up in the audit signal. Deliberately *not* wrapped in `withTransaction` — imports can span minutes across dozens of writes and the existing UX is phase-level recovery. ### tRPC (`packages/api/src/router/import.ts`) 124 lines → 28 lines. The router is now a thin `previewImport` / `runImport` wrapper; the input schemas and all validation live in the service. The router-level `TRPCError`-throwing `pageId` ownership check moved into `runImport` so non-tRPC callers (Slack / future) get the same guard. ### Error shape changes - Provider validation failure: `TRPCError("BAD_REQUEST")` → `ValidationError` → `TRPCError("BAD_REQUEST")`. Net-same. - Unknown / wrong-workspace `pageId`: `TRPCError("NOT_FOUND")` → `NotFoundError` → `TRPCError("NOT_FOUND")`. Net-same. ### Tests - Unit tests for `addLimitWarnings` / `clampPeriodicity` / `computePhaseStatus` move to `packages/services/src/import/__tests__/`. - Router integration tests (`packages/api/src/router/import.test.ts`) that previously called `previewImport` / `runImport` directly to override workspace limits now route through `makeCaller(limitsOverride)` with an explicit `provider: "statuspage"` field. This also fixes four pre-existing TypeScript errors where those calls were missing the (required) provider discriminator. ### Enforcement - Biome `noRestrictedImports` override adds `packages/api/src/router/import.ts`. - Subpath export `@openstatus/services/import` added. - `@openstatus/importers` added to services deps; services `tsconfig.json` bumped to `moduleResolution: "bundler"` so the importers package-exports map resolves (same setting `packages/api` already uses). ### Cleanup Deletes `packages/api/src/service/import.ts` (1042 lines) and its test file (463 lines). Only `telegram-updates.ts` remains in `packages/api/src/service/` — that's slated for a follow-up PR. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(services/import): per-resource audit + Cubic fixes on #2111 Two changes folded together: ### Per-resource audit Every phase writer now emits one `emitAudit` row per *created* resource, matching what the domain services emit for normal CRUD: | Phase | Audit action | --- | --- | page | `page.create` | componentGroups | `page_component_group.create` | components | `page_component.create` | monitors | `monitor.create` | incidents | `status_report.create` + `status_report.add_update` per update | maintenances | `maintenance.create` | subscribers | `page_subscriber.create` Skipped resources don't emit (their original create audit already exists); failed resources don't emit (nothing was written); link-table rows (statusReportsToPageComponents etc.) don't emit (edges, not entities). Metadata always carries `source: "import"` + `provider: <name>` + `sourceId: <provider-id>` so the audit trail traces back to the source system. The rollup `import.run` audit still fires at the end — the per-resource rows give forensic granularity, the run-level row gives "this bulk operation happened" without scanning the full summary blob. For the change, phase writers now take a shared `PhaseContext = { ctx, tx, provider }` instead of `(db, workspaceId, limits)` — the orchestrator builds one `PhaseContext` per run and threads it through, giving each writer access to `ctx.actor` for audit attribution. `statusReportUpdate` writes now use `.returning({ id })` so the per-update audit can attribute the right row. ### Cubic review fixes - **`run.ts:130`** — phases after `page` kept their provider-assigned status when `targetPageId` was falsy but the user option wasn't `false`. Replaced the narrow `else if (option === false)` branches with a plain `else → phase.status = "skipped"`, matching what `subscribers` already did. - **`run.ts:147`** — when the `components` phase hit `remaining <= 0`, the phase was marked `"failed"` but individual resource statuses were left stale with no error string. Each resource is now marked `"skipped"` with `"Skipped: component limit reached (N)"`, matching `writeMonitorsPhase`. Phase-level status becomes `"skipped"` too (was `"failed"` — failed implied a writer error, this is really a plan-limit pre-check). - **`provider.ts`** — both `createProvider` and `buildProviderConfig` had a `default:` that silently ran the Statuspage adapter for any unknown provider name, which would mask a typo by handing a non- Statuspage api key to the wrong adapter. Replaced with exhaustive `case "statuspage"` + `never`-typed default throw. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(services): rename rpc/services → rpc/handlers (PR 10/N) (#2112) The symbolic deliverable from the plan's "close the loop" PR. Renames `apps/server/src/routes/rpc/services/` → `apps/server/src/routes/rpc/handlers/` so the distinction between "the services layer" (owns business logic, lives in `packages/services`) and "Connect transport handlers" (thin proto → service → proto wrappers) is permanent and visible in the path. Keeping the old name invites the next developer to "just add one small thing" to a file under a `services/` folder months later; the rename makes the layering explicit. ### Changes - `git mv` of the six domain subdirectories + their tests (health / maintenance / monitor / notification / status-page / status-report). - `router.ts` import paths updated from `./services/*` to `./handlers/*`. - Biome `overrides.include` paths updated to the new location. - Added `apps/server/src/routes/rpc/handlers/health/**` to the scope — the health handler has no db usage today; including it locks in that invariant. ### Still out of scope (follow-ups) Rather than pretending the full "close the loop" deliverable is possible today, the biome.jsonc comment now enumerates exactly what remains unmigrated: - `packages/api/src/router/statusPage.ts` — public viewer endpoints under `publicProcedure`, no authed `ServiceContext`. - `packages/api/src/router/{member,integration,monitorTag, pageSubscriber,privateLocation,checker,feedback,stripe,tinybird, email}.ts` — small domains not yet lifted. - `apps/server/src/routes/rpc/handlers/monitor/**` — 6 jobType-specific methods still on db. - `apps/server/src/routes/rpc/handlers/status-page/**` — page CRUD is migrated (PR 7), but components / groups / subscribers / viewer (13 methods) still import db, so the whole file stays out of scope. - `apps/server/src/routes/v1/**` — the public HTTP API surface. - `apps/server/src/routes/slack/**` except `interactions.ts` — tools, handler, oauth, workspace-resolver still on db. - `apps/server/src/routes/public/**` — public-facing HTTP routes. Each of the above is its own PR-sized migration. The final consolidation (broadening to `router/**` + dropping `@openstatus/db` from `packages/api` and `apps/server`) is conditional on all of them landing first. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(services/import): use ctx workspaceId for page insert `writePagePhase` was inserting with `data.workspaceId` — the value the provider package round-tripped into resource data. Every other phase writer (monitor / components / subscriber) already reads `workspaceId` from `ctx.workspace.id`; this lines the page insert up with that pattern. Defends against the (unlikely) case where a provider mapper serialises the wrong workspace id into its output, since `ctx` is the authoritative source. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(services): address Claude review findings on #2110 Six findings from Claude's review pass — five code/doc fixes, one documentation-only note. **P2 — `acceptInvitation` derives userId from `ctx.actor`.** Was taking it from input: the email scoped *which* invitation could be accepted, but not *who* the membership was inserted for. A caller with the right token+email could insert a membership under an arbitrary user id. Removed `userId` from `AcceptInvitationInput`; derived from `tryGetActorUserId(ctx.actor)`, throws `UnauthorizedError` for non-user actors. Mirrors the same pattern applied to `createApiKey.createdById` in the Cubic pass. Router and test updated accordingly. **P2 — `getWorkspace` throws `NotFoundError` explicitly.** `findFirst` + `selectWorkspaceSchema.parse(undefined)` was throwing `ZodError` (→ `BAD_REQUEST`) instead of the `NotFoundError` shape every other service uses. Unreachable in practice (ctx.workspace is resolved upstream) but the error shape was the only outlier; consistency matters for callers pattern-matching on error codes. **P3 — `listApiKeys` filters null `createdById` before the IN query.** The new `createApiKey` path enforces a non-null creator, but legacy rows may have null. SQL's `x IN (NULL)` is `UNKNOWN` — technically safe — but drizzle types model the array as `number[]`. Filtering upfront keeps the types honest and sidesteps any future surprise. **P3 — `deleteInvitation` guards `acceptedAt IS NULL`.** The WHERE previously allowed hard-deleting *accepted* invitations, wiping the "user was invited on X" breadcrumb. Added the `isNull(acceptedAt)` guard + doc comment explaining the audit-trail preservation intent. **Doc-only — `deleteAccount` orphan comment.** Non-owner memberships are removed, but owner memberships + owned workspaces survive. Matches legacy behavior. Added a scope-note docblock flagging that workspace cleanup is explicitly out of scope (belongs to a future admin / scheduled job). **Doc-only — `createInvitation` role comment.** The invite insert lets `role` fall through to the schema default (`member`). Matches legacy (which also only picked `email`). Comment added so the absence reads as deliberate rather than overlooked. Minor — the concurrent-accept race test is covered by the conditional UPDATE + `ConflictError` path from the earlier P1 fix; mocking it reliably against SQLite is noisy and not worth the test complexity. Documented in the related code comment. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(services): address Claude re-review findings on #2110 Four issues surfaced after the first round of fixes on this PR: **P2 — `listApiKeys` crashes on all-legacy keys.** After the null filter added in the previous commit, workspaces whose keys all pre-date the services migration (every `createdById` null) end up with `creatorIds === []`. Drizzle throws "At least one value must be provided" on an empty `inArray`, taking the whole endpoint down. Added an early return that maps `createdBy: undefined` when there are no non-null creator ids to look up. **P2 — `getWorkspaceWithUsage` ZodError on missing row.** Same `findFirst` + `selectWorkspaceSchema.parse(result)` pattern as `getWorkspace`, but without the `NotFoundError` guard that got added in the earlier pass. Added the guard. Also cleaned up the usage block — no longer needs optional chaining once the narrowing fires. **P2 — `deleteAccount` took `userId` from input.** Completing the `createApiKey` / `acceptInvitation` pattern: account deletion must target `ctx.actor`, never an arbitrary id. Dropped `userId` from `DeleteAccountInput` (now an empty forward-compat shape), derived inside the service via `tryGetActorUserId`, throws `UnauthorizedError` for non-user actors. Router updated to stop passing it. **P3 — `createInvitation` dev-token log could leak in tests.** Tightened the comment around the `process.env.NODE_ENV === "development"` guard to flag that strict equality is load-bearing — bun:test sets `NODE_ENV=test` and CI leaves it undefined, both of which correctly skip the log. No behavior change, just a clearer contract so the next reader doesn't loosen it. Cubic's two findings on this review pass point at `packages/api/src/ router/import.ts` and `packages/services/src/import/limits.ts` — both live in the next PR up the stack (#2111 / feat/services-import) and will be addressed there. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(services): address latest Cubic pass on #2110 Four findings from the third Cubic review (now that #2111's import domain is included in the #2110 diff via the stack): **P2 — biome.jsonc notification handler scope.** Only `notification/index.ts` was in the `noRestrictedImports` override. Sibling files (`errors.ts`, `test-providers.ts`) were outside the migration guard, so new db imports could land in them without the lint failing. Broadened to `notification/**` and moved the two files that *legitimately* still read db (`limits.ts` querying workspace quotas, `converters.ts` needing db enum shapes for proto round-trip) into the `ignore` list. Future siblings are enforced by default rather than silently slipping through. **P2 — `clampPeriodicity` unknown values returned too fast.** `PERIODICITY_ORDER.indexOf("unknown") === -1` → `Math.max(-1, 0) === 0` → walk started at `"30s"` (the fastest tier). Could return an interval faster than requested, violating the "never-faster-than-requested" invariant. Short-circuits now to the slowest allowed tier when the requested value isn't a known periodicity. Added unit tests covering the unknown-value + empty- allowed fallback paths. **P2 — component/monitor limit warnings counted total resources, not quota-consuming inserts.** If the import contained 4 components and 3 already existed (would be skipped as duplicates), the warning claimed `"Only X of 4 can be imported"` — but actually zero quota would be consumed by the 3 skips, so the real new-creation count might fit entirely. Reworded to `"Only N new components may be created … some of the M in the import may already exist and be skipped"`. Same treatment for the monitors warning. Preview stays DB-light (no per-resource existence checks); the warning now honestly conveys worst-case without misleading users about what will actually happen. Test assertions updated to match the new wording with substring matches that aren't tied to the exact fraction. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
5 issues found across 80 files (changes from recent commits).
Note: This PR contains a large number of files. cubic only reviews up to 75 files per PR, so some files may not have been reviewed. cubic prioritises the most important files to review.
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="packages/services/src/workspace/schemas.ts">
<violation number="1" location="packages/services/src/workspace/schemas.ts:15">
P2: `name` validation allows whitespace-only workspace names. Trim before `min(1)` so blank names are rejected.</violation>
</file>
<file name="packages/services/src/import/schemas.ts">
<violation number="1" location="packages/services/src/import/schemas.ts:25">
P2: Restrict `pageId` to positive integers. Accepting `0`/negative values bypasses truthy-gated ownership and page-target logic, so invalid IDs are silently treated as missing IDs.</violation>
</file>
<file name="packages/services/src/import/phase-writers.ts">
<violation number="1" location="packages/services/src/import/phase-writers.ts:126">
P1: Missing idempotency check in `writeMaintenancesPhase`. Same pattern gap as incidents — every other writer deduplicates, but this one always inserts. A retried import will duplicate all maintenances. Consider matching on `title + from + to + pageId` before inserting.</violation>
<violation number="2" location="packages/services/src/import/phase-writers.ts:358">
P1: Missing idempotency check in `writeIncidentsPhase`. Every other phase writer deduplicates against existing records before inserting, but this one unconditionally creates new `statusReport` rows. A retried import will duplicate all incidents. Consider matching on `title + pageId` (or a provider-stable `sourceId`) before inserting.</violation>
</file>
<file name="packages/services/src/invitation/__tests__/invitation.test.ts">
<violation number="1" location="packages/services/src/invitation/__tests__/invitation.test.ts:87">
P2: Seed a member or existing pending invite before asserting the free-plan cap, otherwise this test will never hit `LimitExceededError`.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
| return existingBySlug.id; | ||
| } | ||
|
|
||
| const [inserted] = await tx |
There was a problem hiding this comment.
P1: Missing idempotency check in writeMaintenancesPhase. Same pattern gap as incidents — every other writer deduplicates, but this one always inserts. A retried import will duplicate all maintenances. Consider matching on title + from + to + pageId before inserting.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/services/src/import/phase-writers.ts, line 126:
<comment>Missing idempotency check in `writeMaintenancesPhase`. Same pattern gap as incidents — every other writer deduplicates, but this one always inserts. A retried import will duplicate all maintenances. Consider matching on `title + from + to + pageId` before inserting.</comment>
<file context>
@@ -0,0 +1,761 @@
+ return existingBySlug.id;
+ }
+
+ const [inserted] = await tx
+ .insert(page)
+ .values({
</file context>
| sourceComponentIds: string[]; | ||
| }; | ||
|
|
||
| const [insertedReport] = await tx |
There was a problem hiding this comment.
P1: Missing idempotency check in writeIncidentsPhase. Every other phase writer deduplicates against existing records before inserting, but this one unconditionally creates new statusReport rows. A retried import will duplicate all incidents. Consider matching on title + pageId (or a provider-stable sourceId) before inserting.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/services/src/import/phase-writers.ts, line 358:
<comment>Missing idempotency check in `writeIncidentsPhase`. Every other phase writer deduplicates against existing records before inserting, but this one unconditionally creates new `statusReport` rows. A retried import will duplicate all incidents. Consider matching on `title + pageId` (or a provider-stable `sourceId`) before inserting.</comment>
<file context>
@@ -0,0 +1,761 @@
+ sourceComponentIds: string[];
+ };
+
+ const [insertedReport] = await tx
+ .insert(statusReport)
+ .values({
</file context>
| export type ListWorkspacesInput = z.infer<typeof ListWorkspacesInput>; | ||
|
|
||
| export const UpdateWorkspaceNameInput = z.object({ | ||
| name: z.string().min(1), |
There was a problem hiding this comment.
P2: name validation allows whitespace-only workspace names. Trim before min(1) so blank names are rejected.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/services/src/workspace/schemas.ts, line 15:
<comment>`name` validation allows whitespace-only workspace names. Trim before `min(1)` so blank names are rejected.</comment>
<file context>
@@ -0,0 +1,17 @@
+export type ListWorkspacesInput = z.infer<typeof ListWorkspacesInput>;
+
+export const UpdateWorkspaceNameInput = z.object({
+ name: z.string().min(1),
+});
+export type UpdateWorkspaceNameInput = z.infer<typeof UpdateWorkspaceNameInput>;
</file context>
| * against that page's current component count so the remaining-capacity | ||
| * warnings line up with what `run` will actually do. | ||
| */ | ||
| pageId: z.number().int().optional(), |
There was a problem hiding this comment.
P2: Restrict pageId to positive integers. Accepting 0/negative values bypasses truthy-gated ownership and page-target logic, so invalid IDs are silently treated as missing IDs.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/services/src/import/schemas.ts, line 25:
<comment>Restrict `pageId` to positive integers. Accepting `0`/negative values bypasses truthy-gated ownership and page-target logic, so invalid IDs are silently treated as missing IDs.</comment>
<file context>
@@ -0,0 +1,42 @@
+ * against that page's current component count so the remaining-capacity
+ * warnings line up with what `run` will actually do.
+ */
+ pageId: z.number().int().optional(),
+});
+export type PreviewImportInput = z.infer<typeof PreviewImportInput>;
</file context>
|
|
||
| test("enforces the members plan cap on free workspace", async () => { | ||
| // Free plan has `members: 1` — the owner already occupies that slot. | ||
| await expect( |
There was a problem hiding this comment.
P2: Seed a member or existing pending invite before asserting the free-plan cap, otherwise this test will never hit LimitExceededError.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/services/src/invitation/__tests__/invitation.test.ts, line 87:
<comment>Seed a member or existing pending invite before asserting the free-plan cap, otherwise this test will never hit `LimitExceededError`.</comment>
<file context>
@@ -0,0 +1,220 @@
+
+ test("enforces the members plan cap on free workspace", async () => {
+ // Free plan has `members: 1` — the owner already occupies that slot.
+ await expect(
+ createInvitation({
+ ctx: freeCtx,
</file context>
Six items from Claude's review, going with the calls I leaned toward in the question-back: **P2 — tRPC `updateCustomDomain` wasteful `getPage` read.** Was calling `getPage(id)` (fires 3 batched relation queries: maintenances + components + groups) just to grab `customDomain` before the Vercel add/remove calls. Added a narrow `getPageCustomDomain` service helper — single indexed lookup, workspace-scoped, returns the string directly. Router swapped over. Service-layer authority preserved; no db reads leak into the router. **P2 — Connect `updateStatusPage` slug-race code drift.** Handler pre-checks slug to surface `slugAlreadyExistsError` (`Code.AlreadyExists`). The `updatePageGeneral` service call re-validates via `assertSlugAvailable` → `ConflictError` → `Code.InvalidArgument` in the race where two callers both clear the pre-check. Wrapped the call in `try/catch (ConflictError)` and rethrow as `slugAlreadyExistsError(req.slug)` so gRPC clients keying on the code get a consistent `AlreadyExists` whether they lose at the pre-check or at the inner tx. **P2 — Connect `createStatusPage` / `updateStatusPage` customDomain without Vercel sync.** Pre-existing behaviour (the direct-db handler had the same gap). Added a top-of-impl comment so it doesn't go unnoticed — the fix is a shared transport-layer helper the Connect handlers can reuse, out of scope for this migration PR to keep the behavioural blast radius small for external API consumers. **P3 — double cast `row as unknown as Page` in `create.ts`.** The drizzle insert-returning type and the `Page` type diverge on `authEmailDomains` / `allowedIpRanges` (raw comma-joined string vs parsed `string[]`). Replaced the double casts with `selectPageSchema.parse(row)` which normalises the row into the shape callers expect. Cast-drift is now impossible to introduce silently. **P3 — `void ConflictError;` workaround.** Import was unused in `create.ts`; the `void` line was silencing the unused-import warning rather than fixing the cause. Removed both. **P3 — deprecated `passwordProtected` column.** Added a doc block on `updatePagePasswordProtection` flagging that the deprecated boolean column is intentionally not written here (the v1 REST read path derives it from `accessType` via `normalizePasswordProtected`). Prevents a future reader from mistaking the omission for an oversight and writing two sources of truth for the same signal. Test coverage for the 5 untested update services (`updatePagePasswordProtection`, `updatePageCustomDomain`, `updatePageAppearance`, `updatePageLinks`, `updatePageConfiguration`) deferred to a follow-up per Claude's "not blocking" marker — the failing-edge behaviour is the critical bit, and `updatePagePasswordProtection` already has indirect coverage through the Connect handler tests on this branch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Five findings from Cubic's second review cycle on this PR, all on files that entered this branch via the #2109 (status-page) and #2111 (import) squash-merges stacked on top. Fixing here so the cumulative state reaching main is clean. **P1 — `page/create.ts` double-encoded JSON configuration.** `page.configuration` is a drizzle `text("…", { mode: "json" })` column — drizzle serialises objects automatically. Calling `JSON.stringify(configuration)` first stored a raw JSON string in the column, breaking any downstream read that expects an object (e.g. the appearance merge at `update.ts:185`). Dropped the wrap; drizzle handles it. **P2 — `page/schemas.ts` slug + customDomain validation weaker than insert schema.** `NewPageInput.slug`, `GetSlugAvailableInput.slug`, and `UpdatePageGeneralInput.slug` were `z.string().toLowerCase()` — no regex, no min-length. `UpdatePageCustomDomainInput.customDomain` was `z.string().toLowerCase()` — no format check. Meant the service would accept malformed slugs / URLs that `createPage` would then reject via `insertPageSchema`, or — worse — that `getSlugAvailable` would confidently return "available" for garbage. Exported the canonical `slugSchema` + `customDomainSchema` from `@openstatus/db/src/schema/pages/validation` and reused them across all four service inputs; db validation is now the single source of truth for page slug/domain shape. **P2 — `api/router/import.ts` nullish → optional contract narrowing.** The service's `PreviewImportInput`/`RunImportInput` used `.optional()` for the three provider page-id fields, which dropped the `null` acceptance the legacy router had via `.nullish()`. Existing clients sending `null` would have started hitting `Invalid input` errors after the import migration landed. Added a `nullishString` transform in the service schema that accepts `string | null | undefined` and normalises to `string | undefined` before it reaches `buildProviderConfig` — callers keep the broader contract, service internals stay ignorant of `null`. **P2 — `page/update.ts` empty array stored "" not null.** `authEmailDomains?.join(",") ?? null` coerces `null`/`undefined` to `null`, but `[].join(",")` returns `""` (empty string) which `??` treats as a value. Callers sending `authEmailDomains: []` to clear the column were persisting the empty string instead of nulling it — misleading "present but blank" state. Switched to `|| null` on both array-join outputs (`authEmailDomains` + `allowedIpRanges`) so the three clearing inputs — `undefined`, `null`, `[]` — all land on DB `NULL` while real non-empty joins pass through unchanged. Test fixtures already use slugs ≥ 3 chars that match the regex, so the tightened validation doesn't break any existing assertions. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(services): Connect RPC notification handler onto services (catch-up) Follow-up to PR 5 — noticed on review that my PRs from PR 4 onwards had been narrowing scope to tRPC only and deferring Connect handlers, which was piling up. This closes the notification Connect gap. ## What changed `apps/server/src/routes/rpc/services/notification/index.ts` — the five CRUD methods now delegate to `@openstatus/services/notification`: - `createNotification` → `createNotification` service (handles the plan-count limit, per-provider plan gate, and data-schema validation internally — the Connect-side `checkNotificationLimit` / `checkProviderAllowed` / `validateProviderDataConsistency` calls are gone). - `getNotification`, `listNotifications`, `updateNotification`, `deleteNotification` — thin proto-to-service-to-proto wrappers. - `updateNotification` reads the existing record via the service and fills in missing fields (Connect's update is partial; the service expects a full payload), then applies the update. Left inline: - `sendTestNotification` — calls `test-providers.ts` (external HTTP). - `checkNotificationLimit` RPC method — returns the count info via `./limits.ts` helpers (pure queries, no domain mutation). The local Connect helpers (`validateProviderDataConsistency`, `checkNotificationLimit`, `checkProviderAllowed`, and the ad-hoc `validateMonitorIds` / `updateMonitorAssociations` / `getMonitorById` / `getMonitorCountForNotification` / `getMonitorIdsForNotification`) are no longer imported by `index.ts`; they remain in their files because `test-providers.ts` and the unmigrated Connect monitor handler still reference some of them. ## Biome Added `apps/server/src/routes/rpc/services/notification/index.ts` to the `noRestrictedImports` scope. The directory-level glob isn't a fit because `limits.ts` and `test-providers.ts` legitimately need direct db access until their own follow-up migrations. ## Deferred - **Connect monitor handler** (~880 lines, 6 jobType-specific create/update methods + 3 external-integration methods) — requires a much bigger refactor. Flagged as dedicated PR 4b; tracked separately. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(services): dedupe monitor ids in Connect createNotification response Cubic's P2 catch: the service dedupes `monitors` before the insert (via `validateMonitorIds` in the services package), but the Connect handler echoed `req.monitorIds` verbatim back in the response. For an input like `["1", "1", "2"]` the DB stored `[1, 2]` while the response claimed `["1", "1", "2"]` — caller state diverges from persistence. Echo `Array.from(new Set(req.monitorIds))` instead so the response matches what's actually stored. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(services): migrate page authoring (PR 7/N) (#2109) * feat(services): migrate page (status-page authoring) onto service layer Seventh domain. Migrates the 13 authoring procedures in `pageRouter` onto `@openstatus/services/page`. Deliberately scoped to authoring CRUD only: - `statusPage.ts` — public viewer endpoints (subscribe / get / uptime / report / verify / unsubscribe) are a separate surface that doesn't use the authenticated `ServiceContext`; dedicated follow-up. - Connect `apps/server/src/routes/rpc/services/status-page/**` — ~1500 lines with 18 methods (page CRUD + components + groups + subscribers + view). Too big for this PR; dedicated follow-up, same shape as the Connect monitor deferral. ## Services (`packages/services/src/page/`) - `createPage` / `newPage` — full vs minimal create; both enforce the `status-pages` plan cap and (for `createPage`) the per-access-type plan gates (password-protection, email-domain-protection, ip- restriction, no-index). - `deletePage` — FK cascade clears components / groups / reports / subscribers. - `listPages` — batched enrichment with `statusReports`. - `getPage` — enriched with `maintenances` / `pageComponents` / `pageComponentGroups`. - `getSlugAvailable` — pure check against `subdomainSafeList` + DB. - `updatePageGeneral` — with slug-uniqueness re-check on change. - `updatePageCustomDomain` — persists the DB change and returns the previous domain so the caller can diff. Vercel add/remove stays at the tRPC layer (external integration). - `updatePagePasswordProtection` — re-applies the same plan gates the `create` path uses. - `updatePageAppearance`, `updatePageLinks`, `updatePageLocales` (gated on `i18n` plan flag), `updatePageConfiguration`. - Audit action emitted for every mutation. ## tRPC (`packages/api/src/router/page.ts`) All 13 procedures are thin wrappers. `delete` catches `NotFoundError` for idempotency. `updateCustomDomain` orchestrates: 1. `getPage` (via service) to read the existing domain. 2. `addDomainToVercel` / `removeDomainFromVercel` as needed. 3. `updatePageCustomDomain` (via service) to persist. ## Enforcement - Biome scope adds `packages/api/src/router/page.ts`. The router imports `insertPageSchema` via the services re-export (`CreatePageInput`) so the db-import ban applies cleanly. - Subpath export `@openstatus/services/page`. ## Tests - `__tests__/page.test.ts` covers `newPage` happy / reserved / duplicate, `createPage` monitor attachment + cross-workspace monitor, `updatePageGeneral` rename + duplicate-slug conflict + cross-workspace, `updatePageLocales` plan gate, list / get / slug-available workspace isolation, delete cross-workspace NotFoundError. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(services): migrate Connect status-page page CRUD onto services Extends PR #2109 to cover the Connect RPC status-page handler's page CRUD surface (create / get / list / update / delete), matching the migration that landed for tRPC's `pageRouter`. The other 13 methods (components, groups, subscribers, viewer) still read the db directly — they're separate domains that'll need their own services in follow-ups. - create / get / delete call into `@openstatus/services/page` and preserve the granular Connect errors (`statusPageNotFoundError`, `slugAlreadyExistsError`) by pre-checking before the service call or catching `NotFoundError` → re-throwing the richer variant. - list fetches via the service and paginates in-memory; status-page quota is bounded per workspace so the extra enrichment is negligible. - update loads the existing page via the service, then orchestrates the per-section updates (`updatePageGeneral`, `updatePageLinks`, `updatePageAppearance`, `updatePageCustomDomain`, `updatePageLocales`, `updatePagePasswordProtection`) inside a shared transaction so a partial failure can't leave the page half-updated. Each service's internal `withTransaction` detects the pre-opened tx and skips nesting. - Proto-specific format validations (https icon URL, custom-domain regex, IPv4 CIDR, email-domain shape) and the i18n PermissionDenied path stay at the handler — they don't exist in the zod insert schema and their error codes would change if deferred to the service. - `Page` from the service parses `authEmailDomains` / `allowedIpRanges` into arrays, while the converters (still used by the unmigrated methods) expect the comma-joined string form. `serviceToConverterPage` bridges the two shapes at the call sites that need it. Biome scope deliberately unchanged: the file still imports from `@openstatus/db` for the 13 legacy methods, so the override would light up the whole file. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(services/page): address Cubic review on #2109 Four issues flagged across two Cubic reviews: - `createPage` skipped `assertSlugAvailable`, so full-form creates could bypass reserved/duplicate slug validation and either create a duplicate or fail late on a DB constraint instead of the clean `ConflictError`. Added the check alongside the existing quota gate. - `createPage` passed `passwordProtected` / `allowedIpRanges` but not `allowIndex` to `assertAccessTypeAllowed`, bypassing the `no-index` plan gate on create. Now forwarded. - `UpdatePagePasswordProtectionInput.allowedIpRanges` accepted arbitrary strings. Mirrored the CIDR validation from `insertPageSchema` — bare IPs get `/32` appended, everything pipes through `z.cidrv4()`. - `updatePagePasswordProtection` wrote `authEmailDomains: input.authEmailDomains?.join(",")`, which evaluates to `undefined` when the caller clears the field. Drizzle treats `undefined` as "skip this column" on `.set()`, so stale email domains survived an access-type switch. Added the `?? null` fallback to match the neighboring `allowedIpRanges` line. This fixes the Connect `updateStatusPage` path where switching away from AUTHENTICATED sets `nextAuthEmailDomains = undefined` expecting the column to clear. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(services): migrate workspace / user / invitation / api-key (PR 8/N) (#2110) * feat(services): migrate workspace / user / invitation / api-key (PR 8/N) Stacked on PR #2109. Eighth migration — four small domains consolidated into one PR because each is narrow (roughly two to five procedures) and they share no structural dependencies beyond already-migrated infrastructure. ### Services (`packages/services/src/{workspace,user,invitation,api-key}/`) **workspace** — `getWorkspace`, `getWorkspaceWithUsage` (pages + monitors + notifications + page-components batched via drizzle relations), `listWorkspaces` (takes `userId` explicitly since `list` runs across every workspace the user has access to), `updateWorkspaceName`. **user** — `getUser` (active, non-soft-deleted), `deleteAccount` (the paid- plan guardrail stays; removes non-owned memberships, sessions, OAuth accounts and blanks the PII columns inside a single tx). **invitation** — `createInvitation` (plan gate counts pending invites against the members cap so two outstanding invites can't both accept past the limit), `deleteInvitation`, `listInvitations`, `getInvitationByToken` (scoped by token **and** accepting email to prevent token-sharing), `acceptInvitation` (stamps acceptedAt + inserts membership atomically). **api-key** — `createApiKey` (returns plaintext token once), `revokeApiKey` (workspace-scoped existence check inside the tx so concurrent revokes resolve to a consistent NotFound rather than a silent no-op), `listApiKeys` (replaces the legacy per-row `Promise.all` fan-out with a single IN query for creator enrichment), `verifyApiKey` + `updateApiKeyLastUsed` (no ctx required — the verify path runs before workspace resolution and callers pass an optional `db` override). ### tRPC (`packages/api/src/router/{workspace,user,invitation,apiKey}.ts`) All 14 procedures become thin `try { return await serviceFn(...) } catch { toTRPCError }` wrappers. Router shapes stay identical so the dashboard needs no changes. Connect + Slack don't expose these domains today; migrating their consumers is a follow-up. ### Enforcement Biome `noRestrictedImports` override adds the four router files. Subpath exports `@openstatus/services/{workspace,user,invitation,api-key}` added to the services package. ### Cleanup Deletes `packages/api/src/service/apiKey.ts` and its tests — fully superseded by `packages/services/src/api-key/`. The auth middleware in `apps/server` has its own inline apiKey verification and is unaffected. ### Deliberately out of scope - **`domain.ts`** — pure Vercel-API proxy with no DB usage; not part of the migration surface. Stays as-is. - **`packages/api/src/service/{import,telegram-updates}.ts`** — import migration is PR 9; telegram-updates stays for a follow-up. ### Tests Per-domain `__tests__/*.test.ts` covers: workspace rename + audit, usage counts, members cap hit on free plan, invitation token-mismatch rejection, accept idempotency, api-key creation returning a bcrypt hash, list creator enrichment, revoke NotFoundError on unknown ids, verifyApiKey happy / bad- format / wrong-body paths, lastUsed debounce. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(services): address Cubic review on #2110 Four issues flagged on PR 8: - **P1 — `invitation/accept.ts`**: the read-then-write pattern let two concurrent accepts both pass the `isNull(acceptedAt)` check and race through the membership insert. Replaced with a conditional UPDATE that re-asserts `isNull(acceptedAt)` in the WHERE clause and checks `.returning()` rowcount. The loser gets `ConflictError`, the tx aborts before membership inserts run. - **P2 — `api-key/create.ts`**: `createdById` was taken from input and the router spliced in `ctx.user.id`. Since that column is attribution data (who owns the key, who the audit row blames), trusting input would let any caller forge ownership. Derived from `ctx.actor` via `tryGetActorUserId`; actors without a resolvable user id (system / webhook / unlinked api-key) now get `UnauthorizedError` instead of a silent NULL write. `createdById` removed from the input schema. - **P2 — `invitation/delete.ts`**: audit row was emitted even when the DELETE matched zero rows (unknown id / wrong workspace). Switched to `.returning({ id })` and short-circuit before the audit emit so the log only reflects actual deletions. - **P2 — `invitation/list.ts`**: the `if (!input.email)` → `UnauthorizedError` branch in `getInvitationByToken` was unreachable because `z.email()` already rejects empty / malformed emails at `.parse()`. Removed the dead branch; the router keeps its own pre-call check for `ctx.user.email`, so the transport-level UnauthorizedError path is preserved. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(services): migrate import domain (PR 9/N) (#2111) * feat(services): migrate import domain (PR 9/N) Stacked on PR #2110. Ninth and final domain — lifts the ~1,000-line `packages/api/src/service/import.ts` orchestrator into the services package as its own `@openstatus/services/import` domain. ### Services (`packages/services/src/import/`) Split into focused files: - **`schemas.ts`** — `PreviewImportInput` / `RunImportInput` zod. Provider discriminator + per-provider page-id fields live here; options schema is separately exported for callers that want to pre-validate. - **`provider.ts`** — `createProvider` factory + `buildProviderConfig` reshape helper, isolated from the orchestrator so adding a provider is a one-file change. - **`limits.ts`** — `addLimitWarnings` (shared by preview + run). Pure mutation on the `ImportSummary` argument; no writes. - **`utils.ts`** — `clampPeriodicity` + `computePhaseStatus` helpers. - **`phase-writers.ts`** — the seven phase writers (page / component groups / components / incidents / maintenances / monitors / subscribers). Each takes a `DB` explicitly so callers can thread a pre-opened tx; failing resources get `status: "failed"` with an error string rather than throwing. - **`preview.ts`** — dry-run only; validates credentials, runs the provider with `dryRun: true`, emits warnings. - **`run.ts`** — the orchestrator. Now owns the `pageId` ownership check (previously duplicated in the tRPC router) and emits exactly **one** `import.run` audit row regardless of outcome so partial / failed runs still show up in the audit signal. Deliberately *not* wrapped in `withTransaction` — imports can span minutes across dozens of writes and the existing UX is phase-level recovery. ### tRPC (`packages/api/src/router/import.ts`) 124 lines → 28 lines. The router is now a thin `previewImport` / `runImport` wrapper; the input schemas and all validation live in the service. The router-level `TRPCError`-throwing `pageId` ownership check moved into `runImport` so non-tRPC callers (Slack / future) get the same guard. ### Error shape changes - Provider validation failure: `TRPCError("BAD_REQUEST")` → `ValidationError` → `TRPCError("BAD_REQUEST")`. Net-same. - Unknown / wrong-workspace `pageId`: `TRPCError("NOT_FOUND")` → `NotFoundError` → `TRPCError("NOT_FOUND")`. Net-same. ### Tests - Unit tests for `addLimitWarnings` / `clampPeriodicity` / `computePhaseStatus` move to `packages/services/src/import/__tests__/`. - Router integration tests (`packages/api/src/router/import.test.ts`) that previously called `previewImport` / `runImport` directly to override workspace limits now route through `makeCaller(limitsOverride)` with an explicit `provider: "statuspage"` field. This also fixes four pre-existing TypeScript errors where those calls were missing the (required) provider discriminator. ### Enforcement - Biome `noRestrictedImports` override adds `packages/api/src/router/import.ts`. - Subpath export `@openstatus/services/import` added. - `@openstatus/importers` added to services deps; services `tsconfig.json` bumped to `moduleResolution: "bundler"` so the importers package-exports map resolves (same setting `packages/api` already uses). ### Cleanup Deletes `packages/api/src/service/import.ts` (1042 lines) and its test file (463 lines). Only `telegram-updates.ts` remains in `packages/api/src/service/` — that's slated for a follow-up PR. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(services/import): per-resource audit + Cubic fixes on #2111 Two changes folded together: ### Per-resource audit Every phase writer now emits one `emitAudit` row per *created* resource, matching what the domain services emit for normal CRUD: | Phase | Audit action | --- | --- | page | `page.create` | componentGroups | `page_component_group.create` | components | `page_component.create` | monitors | `monitor.create` | incidents | `status_report.create` + `status_report.add_update` per update | maintenances | `maintenance.create` | subscribers | `page_subscriber.create` Skipped resources don't emit (their original create audit already exists); failed resources don't emit (nothing was written); link-table rows (statusReportsToPageComponents etc.) don't emit (edges, not entities). Metadata always carries `source: "import"` + `provider: <name>` + `sourceId: <provider-id>` so the audit trail traces back to the source system. The rollup `import.run` audit still fires at the end — the per-resource rows give forensic granularity, the run-level row gives "this bulk operation happened" without scanning the full summary blob. For the change, phase writers now take a shared `PhaseContext = { ctx, tx, provider }` instead of `(db, workspaceId, limits)` — the orchestrator builds one `PhaseContext` per run and threads it through, giving each writer access to `ctx.actor` for audit attribution. `statusReportUpdate` writes now use `.returning({ id })` so the per-update audit can attribute the right row. ### Cubic review fixes - **`run.ts:130`** — phases after `page` kept their provider-assigned status when `targetPageId` was falsy but the user option wasn't `false`. Replaced the narrow `else if (option === false)` branches with a plain `else → phase.status = "skipped"`, matching what `subscribers` already did. - **`run.ts:147`** — when the `components` phase hit `remaining <= 0`, the phase was marked `"failed"` but individual resource statuses were left stale with no error string. Each resource is now marked `"skipped"` with `"Skipped: component limit reached (N)"`, matching `writeMonitorsPhase`. Phase-level status becomes `"skipped"` too (was `"failed"` — failed implied a writer error, this is really a plan-limit pre-check). - **`provider.ts`** — both `createProvider` and `buildProviderConfig` had a `default:` that silently ran the Statuspage adapter for any unknown provider name, which would mask a typo by handing a non- Statuspage api key to the wrong adapter. Replaced with exhaustive `case "statuspage"` + `never`-typed default throw. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(services): rename rpc/services → rpc/handlers (PR 10/N) (#2112) The symbolic deliverable from the plan's "close the loop" PR. Renames `apps/server/src/routes/rpc/services/` → `apps/server/src/routes/rpc/handlers/` so the distinction between "the services layer" (owns business logic, lives in `packages/services`) and "Connect transport handlers" (thin proto → service → proto wrappers) is permanent and visible in the path. Keeping the old name invites the next developer to "just add one small thing" to a file under a `services/` folder months later; the rename makes the layering explicit. ### Changes - `git mv` of the six domain subdirectories + their tests (health / maintenance / monitor / notification / status-page / status-report). - `router.ts` import paths updated from `./services/*` to `./handlers/*`. - Biome `overrides.include` paths updated to the new location. - Added `apps/server/src/routes/rpc/handlers/health/**` to the scope — the health handler has no db usage today; including it locks in that invariant. ### Still out of scope (follow-ups) Rather than pretending the full "close the loop" deliverable is possible today, the biome.jsonc comment now enumerates exactly what remains unmigrated: - `packages/api/src/router/statusPage.ts` — public viewer endpoints under `publicProcedure`, no authed `ServiceContext`. - `packages/api/src/router/{member,integration,monitorTag, pageSubscriber,privateLocation,checker,feedback,stripe,tinybird, email}.ts` — small domains not yet lifted. - `apps/server/src/routes/rpc/handlers/monitor/**` — 6 jobType-specific methods still on db. - `apps/server/src/routes/rpc/handlers/status-page/**` — page CRUD is migrated (PR 7), but components / groups / subscribers / viewer (13 methods) still import db, so the whole file stays out of scope. - `apps/server/src/routes/v1/**` — the public HTTP API surface. - `apps/server/src/routes/slack/**` except `interactions.ts` — tools, handler, oauth, workspace-resolver still on db. - `apps/server/src/routes/public/**` — public-facing HTTP routes. Each of the above is its own PR-sized migration. The final consolidation (broadening to `router/**` + dropping `@openstatus/db` from `packages/api` and `apps/server`) is conditional on all of them landing first. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(services/import): use ctx workspaceId for page insert `writePagePhase` was inserting with `data.workspaceId` — the value the provider package round-tripped into resource data. Every other phase writer (monitor / components / subscriber) already reads `workspaceId` from `ctx.workspace.id`; this lines the page insert up with that pattern. Defends against the (unlikely) case where a provider mapper serialises the wrong workspace id into its output, since `ctx` is the authoritative source. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(services): address Claude review findings on #2110 Six findings from Claude's review pass — five code/doc fixes, one documentation-only note. **P2 — `acceptInvitation` derives userId from `ctx.actor`.** Was taking it from input: the email scoped *which* invitation could be accepted, but not *who* the membership was inserted for. A caller with the right token+email could insert a membership under an arbitrary user id. Removed `userId` from `AcceptInvitationInput`; derived from `tryGetActorUserId(ctx.actor)`, throws `UnauthorizedError` for non-user actors. Mirrors the same pattern applied to `createApiKey.createdById` in the Cubic pass. Router and test updated accordingly. **P2 — `getWorkspace` throws `NotFoundError` explicitly.** `findFirst` + `selectWorkspaceSchema.parse(undefined)` was throwing `ZodError` (→ `BAD_REQUEST`) instead of the `NotFoundError` shape every other service uses. Unreachable in practice (ctx.workspace is resolved upstream) but the error shape was the only outlier; consistency matters for callers pattern-matching on error codes. **P3 — `listApiKeys` filters null `createdById` before the IN query.** The new `createApiKey` path enforces a non-null creator, but legacy rows may have null. SQL's `x IN (NULL)` is `UNKNOWN` — technically safe — but drizzle types model the array as `number[]`. Filtering upfront keeps the types honest and sidesteps any future surprise. **P3 — `deleteInvitation` guards `acceptedAt IS NULL`.** The WHERE previously allowed hard-deleting *accepted* invitations, wiping the "user was invited on X" breadcrumb. Added the `isNull(acceptedAt)` guard + doc comment explaining the audit-trail preservation intent. **Doc-only — `deleteAccount` orphan comment.** Non-owner memberships are removed, but owner memberships + owned workspaces survive. Matches legacy behavior. Added a scope-note docblock flagging that workspace cleanup is explicitly out of scope (belongs to a future admin / scheduled job). **Doc-only — `createInvitation` role comment.** The invite insert lets `role` fall through to the schema default (`member`). Matches legacy (which also only picked `email`). Comment added so the absence reads as deliberate rather than overlooked. Minor — the concurrent-accept race test is covered by the conditional UPDATE + `ConflictError` path from the earlier P1 fix; mocking it reliably against SQLite is noisy and not worth the test complexity. Documented in the related code comment. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(services): address Claude re-review findings on #2110 Four issues surfaced after the first round of fixes on this PR: **P2 — `listApiKeys` crashes on all-legacy keys.** After the null filter added in the previous commit, workspaces whose keys all pre-date the services migration (every `createdById` null) end up with `creatorIds === []`. Drizzle throws "At least one value must be provided" on an empty `inArray`, taking the whole endpoint down. Added an early return that maps `createdBy: undefined` when there are no non-null creator ids to look up. **P2 — `getWorkspaceWithUsage` ZodError on missing row.** Same `findFirst` + `selectWorkspaceSchema.parse(result)` pattern as `getWorkspace`, but without the `NotFoundError` guard that got added in the earlier pass. Added the guard. Also cleaned up the usage block — no longer needs optional chaining once the narrowing fires. **P2 — `deleteAccount` took `userId` from input.** Completing the `createApiKey` / `acceptInvitation` pattern: account deletion must target `ctx.actor`, never an arbitrary id. Dropped `userId` from `DeleteAccountInput` (now an empty forward-compat shape), derived inside the service via `tryGetActorUserId`, throws `UnauthorizedError` for non-user actors. Router updated to stop passing it. **P3 — `createInvitation` dev-token log could leak in tests.** Tightened the comment around the `process.env.NODE_ENV === "development"` guard to flag that strict equality is load-bearing — bun:test sets `NODE_ENV=test` and CI leaves it undefined, both of which correctly skip the log. No behavior change, just a clearer contract so the next reader doesn't loosen it. Cubic's two findings on this review pass point at `packages/api/src/ router/import.ts` and `packages/services/src/import/limits.ts` — both live in the next PR up the stack (#2111 / feat/services-import) and will be addressed there. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(services): address latest Cubic pass on #2110 Four findings from the third Cubic review (now that #2111's import domain is included in the #2110 diff via the stack): **P2 — biome.jsonc notification handler scope.** Only `notification/index.ts` was in the `noRestrictedImports` override. Sibling files (`errors.ts`, `test-providers.ts`) were outside the migration guard, so new db imports could land in them without the lint failing. Broadened to `notification/**` and moved the two files that *legitimately* still read db (`limits.ts` querying workspace quotas, `converters.ts` needing db enum shapes for proto round-trip) into the `ignore` list. Future siblings are enforced by default rather than silently slipping through. **P2 — `clampPeriodicity` unknown values returned too fast.** `PERIODICITY_ORDER.indexOf("unknown") === -1` → `Math.max(-1, 0) === 0` → walk started at `"30s"` (the fastest tier). Could return an interval faster than requested, violating the "never-faster-than-requested" invariant. Short-circuits now to the slowest allowed tier when the requested value isn't a known periodicity. Added unit tests covering the unknown-value + empty- allowed fallback paths. **P2 — component/monitor limit warnings counted total resources, not quota-consuming inserts.** If the import contained 4 components and 3 already existed (would be skipped as duplicates), the warning claimed `"Only X of 4 can be imported"` — but actually zero quota would be consumed by the 3 skips, so the real new-creation count might fit entirely. Reworded to `"Only N new components may be created … some of the M in the import may already exist and be skipped"`. Same treatment for the monitors warning. Preview stays DB-light (no per-resource existence checks); the warning now honestly conveys worst-case without misleading users about what will actually happen. Test assertions updated to match the new wording with substring matches that aren't tied to the exact fraction. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(services/page): address Claude review on #2109 Six items from Claude's review, going with the calls I leaned toward in the question-back: **P2 — tRPC `updateCustomDomain` wasteful `getPage` read.** Was calling `getPage(id)` (fires 3 batched relation queries: maintenances + components + groups) just to grab `customDomain` before the Vercel add/remove calls. Added a narrow `getPageCustomDomain` service helper — single indexed lookup, workspace-scoped, returns the string directly. Router swapped over. Service-layer authority preserved; no db reads leak into the router. **P2 — Connect `updateStatusPage` slug-race code drift.** Handler pre-checks slug to surface `slugAlreadyExistsError` (`Code.AlreadyExists`). The `updatePageGeneral` service call re-validates via `assertSlugAvailable` → `ConflictError` → `Code.InvalidArgument` in the race where two callers both clear the pre-check. Wrapped the call in `try/catch (ConflictError)` and rethrow as `slugAlreadyExistsError(req.slug)` so gRPC clients keying on the code get a consistent `AlreadyExists` whether they lose at the pre-check or at the inner tx. **P2 — Connect `createStatusPage` / `updateStatusPage` customDomain without Vercel sync.** Pre-existing behaviour (the direct-db handler had the same gap). Added a top-of-impl comment so it doesn't go unnoticed — the fix is a shared transport-layer helper the Connect handlers can reuse, out of scope for this migration PR to keep the behavioural blast radius small for external API consumers. **P3 — double cast `row as unknown as Page` in `create.ts`.** The drizzle insert-returning type and the `Page` type diverge on `authEmailDomains` / `allowedIpRanges` (raw comma-joined string vs parsed `string[]`). Replaced the double casts with `selectPageSchema.parse(row)` which normalises the row into the shape callers expect. Cast-drift is now impossible to introduce silently. **P3 — `void ConflictError;` workaround.** Import was unused in `create.ts`; the `void` line was silencing the unused-import warning rather than fixing the cause. Removed both. **P3 — deprecated `passwordProtected` column.** Added a doc block on `updatePagePasswordProtection` flagging that the deprecated boolean column is intentionally not written here (the v1 REST read path derives it from `accessType` via `normalizePasswordProtected`). Prevents a future reader from mistaking the omission for an oversight and writing two sources of truth for the same signal. Test coverage for the 5 untested update services (`updatePagePasswordProtection`, `updatePageCustomDomain`, `updatePageAppearance`, `updatePageLinks`, `updatePageConfiguration`) deferred to a follow-up per Claude's "not blocking" marker — the failing-edge behaviour is the critical bit, and `updatePagePasswordProtection` already has indirect coverage through the Connect handler tests on this branch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(services): address Claude review on #2108 Four items from Claude's review of the Connect notification handler backfill: **P3 — `protoDataToServiceInput` swallowed parse failures.** `try { JSON.parse } catch { return {} }` was hiding any malformed output from `protoDataToDb` (which would be a programmer error, not user-input) behind a generic empty-object fallback. The downstream `validateNotificationData` then failed with a far less specific error. Let the throw propagate — `toConnectError` maps it to `Code.Internal`, which is the signal we want for "the helper itself misbehaved." **P3 — `createNotification` response approximated the monitor IDs.** Was echoing `Array.from(new Set(req.monitorIds))` on the happy path (correct, since the service validates + throws on invalid) but the approximation diverged from `updateNotification`'s re-fetch pattern. Now re-fetches via `getNotification` after create so the response reflects what's actually in the DB — one extra IN query per create, eliminates the approximation entirely, makes both handlers structurally identical. **P3 — `sendTestNotification` bypassed `toConnectError`.** Only handler in the impl without a `try { … } catch { toConnectError }` wrap, so any thrown `ServiceError` / `ZodError` from `test-providers.ts` fell through to the interceptor's generic catch and surfaced with a less precise gRPC status. Wrapped for symmetry. **P3 — `JSON.parse(existing.data)` null-unsafe.** Drizzle infers `notification.data` as `string | null` (the column has `default("{}")` but no `.notNull()`). A legacy row with `NULL` in the column would crash `updateNotification` with `SyntaxError` during the partial-update read-modify-write. Added `?? "{}"` fallback and a comment pointing at the schema. Cubic's single finding from the earlier pass (dedupe of `req.monitorIds` in the create response) was already applied in `b69ad13` and has now been superseded by the re-fetch above. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(services): address latest Cubic pass on #2108 Five findings from Cubic's second review cycle on this PR, all on files that entered this branch via the #2109 (status-page) and #2111 (import) squash-merges stacked on top. Fixing here so the cumulative state reaching main is clean. **P1 — `page/create.ts` double-encoded JSON configuration.** `page.configuration` is a drizzle `text("…", { mode: "json" })` column — drizzle serialises objects automatically. Calling `JSON.stringify(configuration)` first stored a raw JSON string in the column, breaking any downstream read that expects an object (e.g. the appearance merge at `update.ts:185`). Dropped the wrap; drizzle handles it. **P2 — `page/schemas.ts` slug + customDomain validation weaker than insert schema.** `NewPageInput.slug`, `GetSlugAvailableInput.slug`, and `UpdatePageGeneralInput.slug` were `z.string().toLowerCase()` — no regex, no min-length. `UpdatePageCustomDomainInput.customDomain` was `z.string().toLowerCase()` — no format check. Meant the service would accept malformed slugs / URLs that `createPage` would then reject via `insertPageSchema`, or — worse — that `getSlugAvailable` would confidently return "available" for garbage. Exported the canonical `slugSchema` + `customDomainSchema` from `@openstatus/db/src/schema/pages/validation` and reused them across all four service inputs; db validation is now the single source of truth for page slug/domain shape. **P2 — `api/router/import.ts` nullish → optional contract narrowing.** The service's `PreviewImportInput`/`RunImportInput` used `.optional()` for the three provider page-id fields, which dropped the `null` acceptance the legacy router had via `.nullish()`. Existing clients sending `null` would have started hitting `Invalid input` errors after the import migration landed. Added a `nullishString` transform in the service schema that accepts `string | null | undefined` and normalises to `string | undefined` before it reaches `buildProviderConfig` — callers keep the broader contract, service internals stay ignorant of `null`. **P2 — `page/update.ts` empty array stored "" not null.** `authEmailDomains?.join(",") ?? null` coerces `null`/`undefined` to `null`, but `[].join(",")` returns `""` (empty string) which `??` treats as a value. Callers sending `authEmailDomains: []` to clear the column were persisting the empty string instead of nulling it — misleading "present but blank" state. Switched to `|| null` on both array-join outputs (`authEmailDomains` + `allowedIpRanges`) so the three clearing inputs — `undefined`, `null`, `[]` — all land on DB `NULL` while real non-empty joins pass through unchanged. Test fixtures already use slugs ≥ 3 chars that match the regex, so the tightened validation doesn't break any existing assertions. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(services): migrate page-component domain onto service layer
Sixth domain, tRPC-only. Migrates `list`, `delete`, and `updateOrder`
onto `@openstatus/services/page-component`.
## Services (`packages/services/src/page-component/`)
- `listPageComponents` — workspace-scoped filter + optional pageId
filter. Batched enrichment in four IN queries (monitors, groups,
status reports via join, maintenances via join). All relation queries
scoped to the caller's workspace for defence-in-depth.
- `deletePageComponent` — hard-delete. Cascade clears the
`status_report_to_page_component` / `maintenance_to_page_component`
associations. The tRPC wrapper swallows `NotFoundError` to preserve
the pre-migration idempotent behaviour.
- `updatePageComponentOrder` — the complex one. Mirrors the existing
diff-and-reconcile pass faithfully (≈220 lines → a single transaction):
1. Assert the page is in the workspace.
2. Enforce the workspace's `page-components` plan cap.
3. Validate every monitor id in the input set.
4. Remove monitor components whose monitorId isn't in the input;
remove static components based on whether the input carries ids.
5. Clear `groupId` before dropping groups (FK safety), then recreate
groups.
6. Upsert monitor components via `onConflictDoUpdate` on the
`(pageId, monitorId)` unique constraint (preserves ids).
7. Update existing static components by id; insert new ones.
Audit: `page_component.update_order` / `page_component.delete`.
## Surface
- **tRPC** (`packages/api/src/router/pageComponent.ts`): all three
procedures call services. `delete` catches `NotFoundError` and
returns the old `drizzle.returning()`-shaped empty array. The
pre-existing `pageComponent.test.ts` (tests cross-workspace monitorId
→ `TRPCError(FORBIDDEN)`) is untouched and still valid — my services
throw `ForbiddenError`, which `toTRPCError` maps to the same code.
## Enforcement
- Biome `noRestrictedImports` scope adds
`packages/api/src/router/pageComponent.ts`.
- Subpath export `@openstatus/services/page-component`.
## Tests
- `__tests__/page-component.test.ts` covers `updatePageComponentOrder`
happy path (creates monitor + static + grouped components), rejects
cross-workspace monitorId and cross-workspace pageId, `list`
workspace isolation, `delete` cross-workspace `NotFoundError`.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(services): Connect RPC notification handler catch-up (#2108)
* feat(services): Connect RPC notification handler onto services (catch-up)
Follow-up to PR 5 — noticed on review that my PRs from PR 4 onwards had
been narrowing scope to tRPC only and deferring Connect handlers, which
was piling up. This closes the notification Connect gap.
## What changed
`apps/server/src/routes/rpc/services/notification/index.ts` — the five
CRUD methods now delegate to `@openstatus/services/notification`:
- `createNotification` → `createNotification` service (handles the
plan-count limit, per-provider plan gate, and data-schema validation
internally — the Connect-side `checkNotificationLimit` /
`checkProviderAllowed` / `validateProviderDataConsistency` calls are
gone).
- `getNotification`, `listNotifications`, `updateNotification`,
`deleteNotification` — thin proto-to-service-to-proto wrappers.
- `updateNotification` reads the existing record via the service and
fills in missing fields (Connect's update is partial; the service
expects a full payload), then applies the update.
Left inline:
- `sendTestNotification` — calls `test-providers.ts` (external HTTP).
- `checkNotificationLimit` RPC method — returns the count info via
`./limits.ts` helpers (pure queries, no domain mutation).
The local Connect helpers (`validateProviderDataConsistency`,
`checkNotificationLimit`, `checkProviderAllowed`, and the ad-hoc
`validateMonitorIds` / `updateMonitorAssociations` / `getMonitorById` /
`getMonitorCountForNotification` / `getMonitorIdsForNotification`) are
no longer imported by `index.ts`; they remain in their files because
`test-providers.ts` and the unmigrated Connect monitor handler still
reference some of them.
## Biome
Added `apps/server/src/routes/rpc/services/notification/index.ts` to
the `noRestrictedImports` scope. The directory-level glob isn't a fit
because `limits.ts` and `test-providers.ts` legitimately need direct
db access until their own follow-up migrations.
## Deferred
- **Connect monitor handler** (~880 lines, 6 jobType-specific
create/update methods + 3 external-integration methods) — requires a
much bigger refactor. Flagged as dedicated PR 4b; tracked separately.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(services): dedupe monitor ids in Connect createNotification response
Cubic's P2 catch: the service dedupes `monitors` before the insert
(via `validateMonitorIds` in the services package), but the Connect
handler echoed `req.monitorIds` verbatim back in the response. For an
input like `["1", "1", "2"]` the DB stored `[1, 2]` while the response
claimed `["1", "1", "2"]` — caller state diverges from persistence.
Echo `Array.from(new Set(req.monitorIds))` instead so the response
matches what's actually stored.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(services): migrate page authoring (PR 7/N) (#2109)
* feat(services): migrate page (status-page authoring) onto service layer
Seventh domain. Migrates the 13 authoring procedures in
`pageRouter` onto `@openstatus/services/page`. Deliberately scoped to
authoring CRUD only:
- `statusPage.ts` — public viewer endpoints (subscribe / get / uptime /
report / verify / unsubscribe) are a separate surface that doesn't
use the authenticated `ServiceContext`; dedicated follow-up.
- Connect `apps/server/src/routes/rpc/services/status-page/**` — ~1500
lines with 18 methods (page CRUD + components + groups + subscribers
+ view). Too big for this PR; dedicated follow-up, same shape as the
Connect monitor deferral.
## Services (`packages/services/src/page/`)
- `createPage` / `newPage` — full vs minimal create; both enforce the
`status-pages` plan cap and (for `createPage`) the per-access-type
plan gates (password-protection, email-domain-protection, ip-
restriction, no-index).
- `deletePage` — FK cascade clears components / groups / reports /
subscribers.
- `listPages` — batched enrichment with `statusReports`.
- `getPage` — enriched with `maintenances` / `pageComponents` /
`pageComponentGroups`.
- `getSlugAvailable` — pure check against `subdomainSafeList` + DB.
- `updatePageGeneral` — with slug-uniqueness re-check on change.
- `updatePageCustomDomain` — persists the DB change and returns the
previous domain so the caller can diff. Vercel add/remove stays at
the tRPC layer (external integration).
- `updatePagePasswordProtection` — re-applies the same plan gates
the `create` path uses.
- `updatePageAppearance`, `updatePageLinks`, `updatePageLocales`
(gated on `i18n` plan flag), `updatePageConfiguration`.
- Audit action emitted for every mutation.
## tRPC (`packages/api/src/router/page.ts`)
All 13 procedures are thin wrappers. `delete` catches `NotFoundError`
for idempotency. `updateCustomDomain` orchestrates:
1. `getPage` (via service) to read the existing domain.
2. `addDomainToVercel` / `removeDomainFromVercel` as needed.
3. `updatePageCustomDomain` (via service) to persist.
## Enforcement
- Biome scope adds `packages/api/src/router/page.ts`. The router
imports `insertPageSchema` via the services re-export
(`CreatePageInput`) so the db-import ban applies cleanly.
- Subpath export `@openstatus/services/page`.
## Tests
- `__tests__/page.test.ts` covers `newPage` happy / reserved /
duplicate, `createPage` monitor attachment + cross-workspace monitor,
`updatePageGeneral` rename + duplicate-slug conflict + cross-workspace,
`updatePageLocales` plan gate, list / get / slug-available workspace
isolation, delete cross-workspace NotFoundError.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(services): migrate Connect status-page page CRUD onto services
Extends PR #2109 to cover the Connect RPC status-page handler's page
CRUD surface (create / get / list / update / delete), matching the
migration that landed for tRPC's `pageRouter`. The other 13 methods
(components, groups, subscribers, viewer) still read the db directly —
they're separate domains that'll need their own services in follow-ups.
- create / get / delete call into `@openstatus/services/page` and
preserve the granular Connect errors (`statusPageNotFoundError`,
`slugAlreadyExistsError`) by pre-checking before the service call or
catching `NotFoundError` → re-throwing the richer variant.
- list fetches via the service and paginates in-memory; status-page
quota is bounded per workspace so the extra enrichment is negligible.
- update loads the existing page via the service, then orchestrates the
per-section updates (`updatePageGeneral`, `updatePageLinks`,
`updatePageAppearance`, `updatePageCustomDomain`, `updatePageLocales`,
`updatePagePasswordProtection`) inside a shared transaction so a
partial failure can't leave the page half-updated. Each service's
internal `withTransaction` detects the pre-opened tx and skips
nesting.
- Proto-specific format validations (https icon URL, custom-domain
regex, IPv4 CIDR, email-domain shape) and the i18n PermissionDenied
path stay at the handler — they don't exist in the zod insert schema
and their error codes would change if deferred to the service.
- `Page` from the service parses `authEmailDomains` / `allowedIpRanges`
into arrays, while the converters (still used by the unmigrated
methods) expect the comma-joined string form. `serviceToConverterPage`
bridges the two shapes at the call sites that need it.
Biome scope deliberately unchanged: the file still imports from
`@openstatus/db` for the 13 legacy methods, so the override would
light up the whole file.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(services/page): address Cubic review on #2109
Four issues flagged across two Cubic reviews:
- `createPage` skipped `assertSlugAvailable`, so full-form creates
could bypass reserved/duplicate slug validation and either create a
duplicate or fail late on a DB constraint instead of the clean
`ConflictError`. Added the check alongside the existing quota gate.
- `createPage` passed `passwordProtected` / `allowedIpRanges` but not
`allowIndex` to `assertAccessTypeAllowed`, bypassing the `no-index`
plan gate on create. Now forwarded.
- `UpdatePagePasswordProtectionInput.allowedIpRanges` accepted arbitrary
strings. Mirrored the CIDR validation from `insertPageSchema` — bare
IPs get `/32` appended, everything pipes through `z.cidrv4()`.
- `updatePagePasswordProtection` wrote `authEmailDomains:
input.authEmailDomains?.join(",")`, which evaluates to `undefined`
when the caller clears the field. Drizzle treats `undefined` as
"skip this column" on `.set()`, so stale email domains survived an
access-type switch. Added the `?? null` fallback to match the
neighboring `allowedIpRanges` line. This fixes the Connect
`updateStatusPage` path where switching away from AUTHENTICATED sets
`nextAuthEmailDomains = undefined` expecting the column to clear.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(services): migrate workspace / user / invitation / api-key (PR 8/N) (#2110)
* feat(services): migrate workspace / user / invitation / api-key (PR 8/N)
Stacked on PR #2109. Eighth migration — four small domains consolidated into
one PR because each is narrow (roughly two to five procedures) and they
share no structural dependencies beyond already-migrated infrastructure.
### Services (`packages/services/src/{workspace,user,invitation,api-key}/`)
**workspace** — `getWorkspace`, `getWorkspaceWithUsage` (pages + monitors +
notifications + page-components batched via drizzle relations),
`listWorkspaces` (takes `userId` explicitly since `list` runs across every
workspace the user has access to), `updateWorkspaceName`.
**user** — `getUser` (active, non-soft-deleted), `deleteAccount` (the paid-
plan guardrail stays; removes non-owned memberships, sessions, OAuth
accounts and blanks the PII columns inside a single tx).
**invitation** — `createInvitation` (plan gate counts pending invites
against the members cap so two outstanding invites can't both accept past
the limit), `deleteInvitation`, `listInvitations`, `getInvitationByToken`
(scoped by token **and** accepting email to prevent token-sharing),
`acceptInvitation` (stamps acceptedAt + inserts membership atomically).
**api-key** — `createApiKey` (returns plaintext token once), `revokeApiKey`
(workspace-scoped existence check inside the tx so concurrent revokes
resolve to a consistent NotFound rather than a silent no-op),
`listApiKeys` (replaces the legacy per-row `Promise.all` fan-out with a
single IN query for creator enrichment), `verifyApiKey` +
`updateApiKeyLastUsed` (no ctx required — the verify path runs before
workspace resolution and callers pass an optional `db` override).
### tRPC (`packages/api/src/router/{workspace,user,invitation,apiKey}.ts`)
All 14 procedures become thin `try { return await serviceFn(...) } catch
{ toTRPCError }` wrappers. Router shapes stay identical so the dashboard
needs no changes. Connect + Slack don't expose these domains today;
migrating their consumers is a follow-up.
### Enforcement
Biome `noRestrictedImports` override adds the four router files. Subpath
exports `@openstatus/services/{workspace,user,invitation,api-key}` added
to the services package.
### Cleanup
Deletes `packages/api/src/service/apiKey.ts` and its tests — fully
superseded by `packages/services/src/api-key/`. The auth middleware in
`apps/server` has its own inline apiKey verification and is unaffected.
### Deliberately out of scope
- **`domain.ts`** — pure Vercel-API proxy with no DB usage; not part of
the migration surface. Stays as-is.
- **`packages/api/src/service/{import,telegram-updates}.ts`** — import
migration is PR 9; telegram-updates stays for a follow-up.
### Tests
Per-domain `__tests__/*.test.ts` covers: workspace rename + audit, usage
counts, members cap hit on free plan, invitation token-mismatch rejection,
accept idempotency, api-key creation returning a bcrypt hash, list creator
enrichment, revoke NotFoundError on unknown ids, verifyApiKey happy / bad-
format / wrong-body paths, lastUsed debounce.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(services): address Cubic review on #2110
Four issues flagged on PR 8:
- **P1 — `invitation/accept.ts`**: the read-then-write pattern let two
concurrent accepts both pass the `isNull(acceptedAt)` check and race
through the membership insert. Replaced with a conditional UPDATE that
re-asserts `isNull(acceptedAt)` in the WHERE clause and checks
`.returning()` rowcount. The loser gets `ConflictError`, the tx aborts
before membership inserts run.
- **P2 — `api-key/create.ts`**: `createdById` was taken from input and
the router spliced in `ctx.user.id`. Since that column is attribution
data (who owns the key, who the audit row blames), trusting input
would let any caller forge ownership. Derived from `ctx.actor` via
`tryGetActorUserId`; actors without a resolvable user id (system /
webhook / unlinked api-key) now get `UnauthorizedError` instead of a
silent NULL write. `createdById` removed from the input schema.
- **P2 — `invitation/delete.ts`**: audit row was emitted even when the
DELETE matched zero rows (unknown id / wrong workspace). Switched to
`.returning({ id })` and short-circuit before the audit emit so the
log only reflects actual deletions.
- **P2 — `invitation/list.ts`**: the `if (!input.email)` →
`UnauthorizedError` branch in `getInvitationByToken` was unreachable
because `z.email()` already rejects empty / malformed emails at
`.parse()`. Removed the dead branch; the router keeps its own
pre-call check for `ctx.user.email`, so the transport-level
UnauthorizedError path is preserved.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(services): migrate import domain (PR 9/N) (#2111)
* feat(services): migrate import domain (PR 9/N)
Stacked on PR #2110. Ninth and final domain — lifts the ~1,000-line
`packages/api/src/service/import.ts` orchestrator into the services
package as its own `@openstatus/services/import` domain.
### Services (`packages/services/src/import/`)
Split into focused files:
- **`schemas.ts`** — `PreviewImportInput` / `RunImportInput` zod. Provider
discriminator + per-provider page-id fields live here; options schema
is separately exported for callers that want to pre-validate.
- **`provider.ts`** — `createProvider` factory + `buildProviderConfig`
reshape helper, isolated from the orchestrator so adding a provider is
a one-file change.
- **`limits.ts`** — `addLimitWarnings` (shared by preview + run). Pure
mutation on the `ImportSummary` argument; no writes.
- **`utils.ts`** — `clampPeriodicity` + `computePhaseStatus` helpers.
- **`phase-writers.ts`** — the seven phase writers (page / component
groups / components / incidents / maintenances / monitors /
subscribers). Each takes a `DB` explicitly so callers can thread a
pre-opened tx; failing resources get `status: "failed"` with an error
string rather than throwing.
- **`preview.ts`** — dry-run only; validates credentials, runs the
provider with `dryRun: true`, emits warnings.
- **`run.ts`** — the orchestrator. Now owns the `pageId` ownership
check (previously duplicated in the tRPC router) and emits exactly
**one** `import.run` audit row regardless of outcome so partial /
failed runs still show up in the audit signal. Deliberately *not*
wrapped in `withTransaction` — imports can span minutes across dozens
of writes and the existing UX is phase-level recovery.
### tRPC (`packages/api/src/router/import.ts`)
124 lines → 28 lines. The router is now a thin `previewImport` /
`runImport` wrapper; the input schemas and all validation live in the
service. The router-level `TRPCError`-throwing `pageId` ownership check
moved into `runImport` so non-tRPC callers (Slack / future) get the
same guard.
### Error shape changes
- Provider validation failure: `TRPCError("BAD_REQUEST")` →
`ValidationError` → `TRPCError("BAD_REQUEST")`. Net-same.
- Unknown / wrong-workspace `pageId`: `TRPCError("NOT_FOUND")` →
`NotFoundError` → `TRPCError("NOT_FOUND")`. Net-same.
### Tests
- Unit tests for `addLimitWarnings` / `clampPeriodicity` /
`computePhaseStatus` move to `packages/services/src/import/__tests__/`.
- Router integration tests (`packages/api/src/router/import.test.ts`)
that previously called `previewImport` / `runImport` directly to
override workspace limits now route through `makeCaller(limitsOverride)`
with an explicit `provider: "statuspage"` field. This also fixes four
pre-existing TypeScript errors where those calls were missing the
(required) provider discriminator.
### Enforcement
- Biome `noRestrictedImports` override adds `packages/api/src/router/import.ts`.
- Subpath export `@openstatus/services/import` added.
- `@openstatus/importers` added to services deps; services `tsconfig.json`
bumped to `moduleResolution: "bundler"` so the importers package-exports
map resolves (same setting `packages/api` already uses).
### Cleanup
Deletes `packages/api/src/service/import.ts` (1042 lines) and its test
file (463 lines). Only `telegram-updates.ts` remains in
`packages/api/src/service/` — that's slated for a follow-up PR.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(services/import): per-resource audit + Cubic fixes on #2111
Two changes folded together:
### Per-resource audit
Every phase writer now emits one `emitAudit` row per *created*
resource, matching what the domain services emit for normal CRUD:
| Phase | Audit action
| --- | ---
| page | `page.create`
| componentGroups | `page_component_group.create`
| components | `page_component.create`
| monitors | `monitor.create`
| incidents | `status_report.create` + `status_report.add_update` per update
| maintenances | `maintenance.create`
| subscribers | `page_subscriber.create`
Skipped resources don't emit (their original create audit already
exists); failed resources don't emit (nothing was written); link-table
rows (statusReportsToPageComponents etc.) don't emit (edges, not
entities). Metadata always carries `source: "import"` + `provider:
<name>` + `sourceId: <provider-id>` so the audit trail traces back to
the source system.
The rollup `import.run` audit still fires at the end — the per-resource
rows give forensic granularity, the run-level row gives "this bulk
operation happened" without scanning the full summary blob.
For the change, phase writers now take a shared `PhaseContext = { ctx,
tx, provider }` instead of `(db, workspaceId, limits)` — the orchestrator
builds one `PhaseContext` per run and threads it through, giving each
writer access to `ctx.actor` for audit attribution. `statusReportUpdate`
writes now use `.returning({ id })` so the per-update audit can
attribute the right row.
### Cubic review fixes
- **`run.ts:130`** — phases after `page` kept their provider-assigned
status when `targetPageId` was falsy but the user option wasn't
`false`. Replaced the narrow `else if (option === false)` branches
with a plain `else → phase.status = "skipped"`, matching what
`subscribers` already did.
- **`run.ts:147`** — when the `components` phase hit `remaining <= 0`,
the phase was marked `"failed"` but individual resource statuses were
left stale with no error string. Each resource is now marked
`"skipped"` with `"Skipped: component limit reached (N)"`, matching
`writeMonitorsPhase`. Phase-level status becomes `"skipped"` too
(was `"failed"` — failed implied a writer error, this is really a
plan-limit pre-check).
- **`provider.ts`** — both `createProvider` and `buildProviderConfig`
had a `default:` that silently ran the Statuspage adapter for any
unknown provider name, which would mask a typo by handing a non-
Statuspage api key to the wrong adapter. Replaced with exhaustive
`case "statuspage"` + `never`-typed default throw.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* chore(services): rename rpc/services → rpc/handlers (PR 10/N) (#2112)
The symbolic deliverable from the plan's "close the loop" PR. Renames
`apps/server/src/routes/rpc/services/` → `apps/server/src/routes/rpc/handlers/`
so the distinction between "the services layer" (owns business logic,
lives in `packages/services`) and "Connect transport handlers" (thin
proto → service → proto wrappers) is permanent and visible in the path.
Keeping the old name invites the next developer to "just add one small
thing" to a file under a `services/` folder months later; the rename
makes the layering explicit.
### Changes
- `git mv` of the six domain subdirectories + their tests
(health / maintenance / monitor / notification / status-page /
status-report).
- `router.ts` import paths updated from `./services/*` to `./handlers/*`.
- Biome `overrides.include` paths updated to the new location.
- Added `apps/server/src/routes/rpc/handlers/health/**` to the scope —
the health handler has no db usage today; including it locks in that
invariant.
### Still out of scope (follow-ups)
Rather than pretending the full "close the loop" deliverable is possible
today, the biome.jsonc comment now enumerates exactly what remains
unmigrated:
- `packages/api/src/router/statusPage.ts` — public viewer endpoints
under `publicProcedure`, no authed `ServiceContext`.
- `packages/api/src/router/{member,integration,monitorTag,
pageSubscriber,privateLocation,checker,feedback,stripe,tinybird,
email}.ts` — small domains not yet lifted.
- `apps/server/src/routes/rpc/handlers/monitor/**` — 6 jobType-specific
methods still on db.
- `apps/server/src/routes/rpc/handlers/status-page/**` — page CRUD is
migrated (PR 7), but components / groups / subscribers / viewer (13
methods) still import db, so the whole file stays out of scope.
- `apps/server/src/routes/v1/**` — the public HTTP API surface.
- `apps/server/src/routes/slack/**` except `interactions.ts` — tools,
handler, oauth, workspace-resolver still on db.
- `apps/server/src/routes/public/**` — public-facing HTTP routes.
Each of the above is its own PR-sized migration. The final consolidation
(broadening to `router/**` + dropping `@openstatus/db` from
`packages/api` and `apps/server`) is conditional on all of them
landing first.
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(services/import): use ctx workspaceId for page insert
`writePagePhase` was inserting with `data.workspaceId` — the value the
provider package round-tripped into resource data. Every other phase
writer (monitor / components / subscriber) already reads `workspaceId`
from `ctx.workspace.id`; this lines the page insert up with that
pattern. Defends against the (unlikely) case where a provider mapper
serialises the wrong workspace id into its output, since `ctx` is the
authoritative source.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(services): address Claude review findings on #2110
Six findings from Claude's review pass — five code/doc fixes, one
documentation-only note.
**P2 — `acceptInvitation` derives userId from `ctx.actor`.**
Was taking it from input: the email scoped *which* invitation could
be accepted, but not *who* the membership was inserted for. A caller
with the right token+email could insert a membership under an
arbitrary user id. Removed `userId` from `AcceptInvitationInput`;
derived from `tryGetActorUserId(ctx.actor)`, throws
`UnauthorizedError` for non-user actors. Mirrors the same pattern
applied to `createApiKey.createdById` in the Cubic pass. Router and
test updated accordingly.
**P2 — `getWorkspace` throws `NotFoundError` explicitly.**
`findFirst` + `selectWorkspaceSchema.parse(undefined)` was throwing
`ZodError` (→ `BAD_REQUEST`) instead of the `NotFoundError` shape
every other service uses. Unreachable in practice (ctx.workspace is
resolved upstream) but the error shape was the only outlier;
consistency matters for callers pattern-matching on error codes.
**P3 — `listApiKeys` filters null `createdById` before the IN query.**
The new `createApiKey` path enforces a non-null creator, but legacy
rows may have null. SQL's `x IN (NULL)` is `UNKNOWN` — technically
safe — but drizzle types model the array as `number[]`. Filtering
upfront keeps the types honest and sidesteps any future surprise.
**P3 — `deleteInvitation` guards `acceptedAt IS NULL`.**
The WHERE previously allowed hard-deleting *accepted* invitations,
wiping the "user was invited on X" breadcrumb. Added the
`isNull(acceptedAt)` guard + doc comment explaining the audit-trail
preservation intent.
**Doc-only — `deleteAccount` orphan comment.**
Non-owner memberships are removed, but owner memberships + owned
workspaces survive. Matches legacy behavior. Added a scope-note
docblock flagging that workspace cleanup is explicitly out of scope
(belongs to a future admin / scheduled job).
**Doc-only — `createInvitation` role comment.**
The invite insert lets `role` fall through to the schema default
(`member`). Matches legacy (which also only picked `email`).
Comment added so the absence reads as deliberate rather than
overlooked.
Minor — the concurrent-accept race test is covered by the conditional
UPDATE + `ConflictError` path from the earlier P1 fix; mocking it
reliably against SQLite is noisy and not worth the test complexity.
Documented in the related code comment.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(services): address Claude re-review findings on #2110
Four issues surfaced after the first round of fixes on this PR:
**P2 — `listApiKeys` crashes on all-legacy keys.**
After the null filter added in the previous commit, workspaces whose
keys all pre-date the services migration (every `createdById` null)
end up with `creatorIds === []`. Drizzle throws "At least one value
must be provided" on an empty `inArray`, taking the whole endpoint
down. Added an early return that maps `createdBy: undefined` when
there are no non-null creator ids to look up.
**P2 — `getWorkspaceWithUsage` ZodError on missing row.**
Same `findFirst` + `selectWorkspaceSchema.parse(result)` pattern as
`getWorkspace`, but without the `NotFoundError` guard that got added
in the earlier pass. Added the guard. Also cleaned up the usage
block — no longer needs optional chaining once the narrowing fires.
**P2 — `deleteAccount` took `userId` from input.**
Completing the `createApiKey` / `acceptInvitation` pattern: account
deletion must target `ctx.actor`, never an arbitrary id. Dropped
`userId` from `DeleteAccountInput` (now an empty forward-compat
shape), derived inside the service via `tryGetActorUserId`, throws
`UnauthorizedError` for non-user actors. Router updated to stop
passing it.
**P3 — `createInvitation` dev-token log could leak in tests.**
Tightened the comment around the `process.env.NODE_ENV === "development"`
guard to flag that strict equality is load-bearing — bun:test sets
`NODE_ENV=test` and CI leaves it undefined, both of which correctly
skip the log. No behavior change, just a clearer contract so the
next reader doesn't loosen it.
Cubic's two findings on this review pass point at `packages/api/src/
router/import.ts` and `packages/services/src/import/limits.ts` — both
live in the next PR up the stack (#2111 / feat/services-import) and
will be addressed there.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(services): address latest Cubic pass on #2110
Four findings from the third Cubic review (now that #2111's import
domain is included in the #2110 diff via the stack):
**P2 — biome.jsonc notification handler scope.**
Only `notification/index.ts` was in the `noRestrictedImports` override.
Sibling files (`errors.ts`, `test-providers.ts`) were outside the
migration guard, so new db imports could land in them without the
lint failing. Broadened to `notification/**` and moved the two files
that *legitimately* still read db (`limits.ts` querying workspace
quotas, `converters.ts` needing db enum shapes for proto round-trip)
into the `ignore` list. Future siblings are enforced by default
rather than silently slipping through.
**P2 — `clampPeriodicity` unknown values returned too fast.**
`PERIODICITY_ORDER.indexOf("unknown") === -1` → `Math.max(-1, 0) === 0`
→ walk started at `"30s"` (the fastest tier). Could return an
interval faster than requested, violating the
"never-faster-than-requested" invariant. Short-circuits now to the
slowest allowed tier when the requested value isn't a known
periodicity. Added unit tests covering the unknown-value + empty-
allowed fallback paths.
**P2 — component/monitor limit warnings counted total resources, not
quota-consuming inserts.**
If the import contained 4 components and 3 already existed (would be
skipped as duplicates), the warning claimed `"Only X of 4 can be
imported"` — but actually zero quota would be consumed by the 3
skips, so the real new-creation count might fit entirely. Reworded
to `"Only N new components may be created … some of the M in the
import may already exist and be skipped"`. Same treatment for the
monitors warning. Preview stays DB-light (no per-resource existence
checks); the warning now honestly conveys worst-case without
misleading users about what will actually happen. Test assertions
updated to match the new wording with substring matches that aren't
tied to the exact fraction.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(services/page): address Claude review on #2109
Six items from Claude's review, going with the calls I leaned toward
in the question-back:
**P2 — tRPC `updateCustomDomain` wasteful `getPage` read.**
Was calling `getPage(id)` (fires 3 batched relation queries:
maintenances + components + groups) just to grab `customDomain`
before the Vercel add/remove calls. Added a narrow
`getPageCustomDomain` service helper — single indexed lookup,
workspace-scoped, returns the string directly. Router swapped over.
Service-layer authority preserved; no db reads leak into the router.
**P2 — Connect `updateStatusPage` slug-race code drift.**
Handler pre-checks slug to surface `slugAlreadyExistsError`
(`Code.AlreadyExists`). The `updatePageGeneral` service call
re-validates via `assertSlugAvailable` → `ConflictError` →
`Code.InvalidArgument` in the race where two callers both clear the
pre-check. Wrapped the call in `try/catch (ConflictError)` and
rethrow as `slugAlreadyExistsError(req.slug)` so gRPC clients keying
on the code get a consistent `AlreadyExists` whether they lose at
the pre-check or at the inner tx.
**P2 — Connect `createStatusPage` / `updateStatusPage` customDomain
without Vercel sync.** Pre-existing behaviour (the direct-db handler
had the same gap). Added a top-of-impl comment so it doesn't go
unnoticed — the fix is a shared transport-layer helper the Connect
handlers can reuse, out of scope for this migration PR to keep the
behavioural blast radius small for external API consumers.
**P3 — double cast `row as unknown as Page` in `create.ts`.**
The drizzle insert-returning type and the `Page` type diverge on
`authEmailDomains` / `allowedIpRanges` (raw comma-joined string vs
parsed `string[]`). Replaced the double casts with
`selectPageSchema.parse(row)` which normalises the row into the shape
callers expect. Cast-drift is now impossible to introduce silently.
**P3 — `void ConflictError;` workaround.**
Import was unused in `create.ts`; the `void` line was silencing the
unused-import warning rather than fixing the cause. Removed both.
**P3 — deprecated `passwordProtected` column.**
Added a doc block on `updatePagePasswordProtection` flagging that
the deprecated boolean column is intentionally not written here (the
v1 REST read path derives it from `accessType` via
`normalizePasswordProtected`). Prevents a future reader from
mistaking the omission for an oversight and writing two sources of
truth for the same signal.
Test coverage for the 5 untested update services
(`updatePagePasswordProtection`, `updatePageCustomDomain`,
`updatePageAppearance`, `updatePageLinks`, `updatePageConfiguration`)
deferred to a follow-up per Claude's "not blocking" marker — the
failing-edge behaviour is the critical bit, and
`updatePagePasswordProtection` already has indirect coverage through
the Connect handler tests on this branch.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(services): address Claude review on #2108
Four items from Claude's review of the Connect notification handler
backfill:
**P3 — `protoDataToServiceInput` swallowed parse failures.**
`try { JSON.parse } catch { return {} }` was hiding any malformed
output from `protoDataToDb` (which would be a programmer error, not
user-input) behind a generic empty-object fallback. The downstream
`validateNotificationData` then failed with a far less specific
error. Let the throw propagate — `toConnectError` maps it to
`Code.Internal`, which is the signal we want for "the helper itself
misbehaved."
**P3 — `createNotification` response approximated the monitor IDs.**
Was echoing `Array.from(new Set(req.monitorIds))` on the happy path
(correct, since the service validates + throws on invalid) but the
approximation diverged from `updateNotification`'s re-fetch pattern.
Now re-fetches via `getNotification` after create so the response
reflects what's actually in the DB — one extra IN query per create,
eliminates the approximation entirely, makes both handlers
structurally identical.
**P3 — `sendTestNotification` bypassed `toConnectError`.**
Only handler in the impl without a `try { … } catch { toConnectError
}` wrap, so any thrown `ServiceError` / `ZodError` from
`test-providers.ts` fell through to the interceptor's generic catch
and surfaced with a less precise gRPC status. Wrapped for symmetry.
**P3 — `JSON.parse(existing.data)` null-unsafe.**
Drizzle infers `notification.data` as `string | null` (the column has
`default("{}")` but no `.notNull()`). A legacy row with `NULL` in the
column would crash `updateNotification` with `SyntaxError` during
the partial-update read-modify-write. Added `?? "{}"` fallback and a
comment pointing at the schema.
Cubic's single finding from the earlier pass (dedupe of
`req.monitorIds` in the create response) was already applied in
`b69ad13` and has now been superseded by the re-fetch above.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(services): address latest Cubic pass on #2108
Five findings from Cubic's second review cycle on this PR, all on
files that entered this branch via the #2109 (status-page) and #2111
(import) squash-merges stacked on top. Fixing here so the cumulative
state reaching main is clean.
**P1 — `page/create.ts` double-encoded JSON configuration.**
`page.configuration` is a drizzle `text("…", { mode: "json" })`
column — drizzle serialises objects automatically. Calling
`JSON.stringify(configuration)` first stored a raw JSON string in
the column, breaking any downstream read that expects an object
(e.g. the appearance merge at `update.ts:185`). Dropped the wrap;
drizzle handles it.
**P2 — `page/schemas.ts` slug + customDomain validation weaker than
insert schema.**
`NewPageInput.slug`, `GetSlugAvailableInput.slug`, and
`UpdatePageGeneralInput.slug` were `z.string().toLowerCase()` — no
regex, no min-length. `UpdatePageCustomDomainInput.customDomain`
was `z.string().toLowerCase()` — no format check. Meant the service
would accept malformed slugs / URLs that `createPage` would then
reject via `insertPageSchema`, or — worse — that `getSlugAvailable`
would confidently return "available" for garbage. Exported the
canonical `slugSchema` + `customDomainSchema` from
`@openstatus/db/src/schema/pages/validation` and reused them across
all four service inputs; db validation is now the single source of
truth for page slug/domain shape.
**P2 — `api/router/import.ts` nullish → optional contract narrowing.**
The service's `PreviewImportInput`/`RunImportInput` used `.optional()`
for the three provider page-id fields, which dropped the `null`
acceptance the legacy router had via `.nullish()`. Existing clients
sending `null` would have started hitting `Invalid input` errors
after the import migration landed. Added a `nullishString` transform
in the service schema that accepts `string | null | undefined` and
normalises to `string | undefined` before it reaches
`buildProviderConfig` — callers keep the broader contract, service
internals stay ignorant of `null`.
**P2 — `page/update.ts` empty array stored "" not null.**
`authEmailDomains?.join(",") ?? null` coerces `null`/`undefined` to
`null`, but `[].join(",")` returns `""` (empty string) which `??`
treats as a value. Callers sending `authEmailDomains: []` to clear
the column were persisting the empty string instead of nulling it —
misleading "present but blank" state. Switched to `|| null` on both
array-join outputs (`authEmailDomains` + `allowedIpRanges`) so the
three clearing inputs — `undefined`, `null`, `[]` — all land on DB
`NULL` while real non-empty joins pass through unchanged.
Test fixtures already use slugs ≥ 3 chars that match the regex, so
the tightened validation doesn't break any existing assertions.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(services/page-component): address Cubic + Claude review on #2107
Three fixes + one test, addressing both the original Cubic finding
and Claude's re-review pass.
**P2 — Discriminated union for `componentInput`.**
The flat `z.object` with `type: z.enum(["monitor", "static"])` and
optional `monitorId` let callers submit a "monitor" component with
no monitor id, or a "static" one with a monitor id attached. The DB
catches it with a `CHECK` constraint, but that surfaces as an opaque
SQLite CHECK failure instead of a clean `ZodError` at the service
boundary. Replaced with a `z.discriminatedUnion("type", [...])` that
requires `monitorId` on the "monitor" arm and omits it on the
"static" arm.
Fallout in `update-order.ts`: `c.monitorId` no longer exists on the
"static" arm after narrowing, so the spreads now use
`monitorId: c.type === "monitor" ? c.monitorId : null`. The defensive
`&& c.monitorId` guards on the already-narrowed monitor branches are
gone (TypeScript enforces the invariant the DB was catching late).
**P2 — Sequential group insert instead of bulk `.returning()`.**
The bulk insert relied on drizzle/SQLite returning rows in the same
order they were inserted, so `newGroups[i]` could line up with
`input.groups[i]` when mapping components to their groups. True on
Turso today, but an implicit coupling — any driver change, batch
split, or upstream sort could silently reorder rows and land
components in the wrong group with no error signal. Switched to a
loop that captures each group id before moving on; the set size is
bounded by the status-page component-group plan cap so the extra
round trips are a rounding error.
**Nit — removed dead `hasStaticComponentsInInput` guard.**
Both the "input has static components but none carry ids" and "input
has no static components at all" branches collapsed to the same
"drop all existing static components" action, so the outer
`hasStaticComponentsInInput` conditional was doing no work. Dropped
the variable and the nested branch.
**Test — upsert idempotency.**
The `onConflictDoUpdate` on `(pageId, monitorId)` was the riskiest
untested path — a regression would silently insert duplicate rows on
every re-invocation. Added a test that calls
`updatePageComponentOrder` twice on the same page with the same
`monitorId`, then asserts there's exactly one matching row and the
second call's values won.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(services): address latest Cubic pass on #2107 + unblock build
Eight Cubic findings from the second review plus one dashboard build
break from my earlier discriminated-union change.
**Build — router shape diverged from service discriminated union.**
`packages/api/src/router/pageComponent.ts` kept its own flat
`z.object({...})` input schema with `type: z.enum(["monitor",
"static"])` and `monitorId: z.number().nullish()`. After the service
switched to `z.discriminatedUnion("type", [...])`, TS couldn't
reconcile the two — dashboard build failed. Replaced the local
schema with the service's exported `UpdatePageComponentOrderInput`
so both layers share the canonical shape.
**P1 — page router: validate customDomain before Vercel call.**
The router input was `z.string().toLowerCase()` (no format check) and
the service's `customDomainSchema` only fired inside
`updatePageCustomDomain`, *after* the Vercel add/remove mutations.
A malformed domain could be added to Vercel, then rejected by the
service, leaving Vercel/db state drifted. Switched the router input
to the service's `UpdatePageCustomDomainInput` so format validation
runs at tRPC input parsing, before any Vercel call.
**P1 — `listApiKeys` leaked `hashedToken`.**
`SELECT *` returned every column including the bcrypt hash of each
key's one-time token, which has no business appearing in a list
response. Replaced with an explicit column select that omits
`hashedToken`. New `PublicApiKey` type (`Omit<ApiKey,
"hashedToken">`) is the return shape; exported from the barrel.
**P2 — `acceptInvitation` eager workspace load + second fetch.**
The initial `findFirst` already loaded the workspace via `with: {
workspace: true }`, but the return value re-fetched it by id. Use
the joined value directly — one round-trip instead of two, and
eliminates the read-skew window where a just-renamed workspace
could appear with a different name in each fetch.
**P2 — `import.run` audit entityId 0.**
`entityId: targetPageId ?? 0` wrote a ghost `page 0` reference to
the audit trail when the page phase failed before producing an id.
Entity attribution now falls back to the workspace (`entityType:
"workspace"`, `entityId: ctx.workspace.id`) when no target page is
in play — real rollback signal, no phantom foreign key.
**P2 — `page-components` limit scoped per-page, not workspace.**
`page-components` is a workspace-wide cap (see
`page-component/update-order.ts` — counts every component across
every page). The import preview and run's component check were
scoping the existing count to `targetPageId`, which understated
pressure and would let imports push past the cap at write time.
Both sites now count workspace-wide.
**P2 — `writeIncidentsPhase` lacked idempotency.**
Every other phase writer checks for an existing row before
inserting (page by slug, monitor by url, component by name,
subscriber by email); `writeIncidentsPhase` inserted
unconditionally. A re-run would duplicate status reports on every
pass. Added an existence check by `(title, pageId, workspaceId)`
matching the convention.
**P2 — `writeMaintenancesPhase` lacked idempotency.**
Same pattern. Added a check by `(title, pageId, from, to,
workspaceId)` — the `from/to` pair is load-bearing because
maintenance titles recur ("DB upgrade") across unrelated windows.
**P2 — `writeComponentsPhase` silent monitor→static fallback.**
When the source monitor failed to resolve (e.g. `includeMonitors ===
false`), the component was silently degraded to `type = "static"`
and reported as `created` with no explanation. Other phase writers
populate `resource.error` on any degrade path. Added a matching
error string pointing at the source monitor id (or lack thereof) so
the summary conveys the degrade.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(services/page-component): revert discriminated union to flat + .refine
The previous commit switched `componentInput` in the service schema to
a `z.discriminatedUnion("type", [...])` to get a clean `ZodError` at
parse time for the monitor/static invariant. That produced a narrowed
TS shape (`type: "monitor"` → required `monitorId: number`) that every
caller had to match — including the dashboard form, where
react-hook-form can't model discriminated unions cleanly and would
have needed a flat→union adapter at submit time. The ripple was
user-visible frontend churn for a schema-layer concern.
Switched back to a flat `z.object` + cross-field `.refine` on
`(type, monitorId)`. Same parse-time rejection Cubic asked for
(ZodError with a specific path, not an opaque SQLite CHECK failure),
but the inferred TS type stays flat so callers — router input, RHF
form values — keep their existing shape.
Also restored the downstream `&& c.monitorId` guards and
`as number[]` casts in `update-order.ts`. With a flat schema, TS
still sees `monitorId: number | null | undefined` on the monitor
branch; the refine rejects violating input at parse time, but the
guard is needed to narrow for the type system. Matches the
pre-migration shape exactly.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(services/import): reconcile component links on idempotent skip
The idempotency checks in `writeIncidentsPhase` and
`writeMaintenancesPhase` added in the previous pass correctly avoid
duplicate status-report / maintenance rows on rerun, but `continue`-d
out of the writer before the component-link insertion block. The
failure mode this leaves open:
1. Run 1: component phase uses per-resource catch, so a single
component can fail and leave `componentIdMap` partial.
2. The report/maintenance is written with a subset of the intended
links — only the entries whose source id resolved in the map.
3. Run 2: the previously-failed component now succeeds and lands in
`componentIdMap`. The report/maintenance idempotency check hits,
`continue` fires, and the still-missing link is never written.
Both join tables (`statusReportsToPageComponents`,
`maintenancesToPageComponents`) have a composite primary key on
`(parentId, pageComponentId)`. Running the same link-build pass on
the skip path with `.onConflictDoNothing()` is a no-op for the links
already present and adds any that resolved this time round. Matches
the "reruns converge to correct state" model that motivated the
idempotency checks in the first place.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(services): migrate notification CRUD onto service layer
Fifth domain. Migrates the four CRUD procedures — `list`, `new`,
`updateNotifier`, `delete` — of the notification tRPC router onto
`@openstatus/services/notification`. The three integration-helper
procedures (`sendTest`, `createTelegramToken`, `getTelegramUpdates`)
stay inline at the tRPC layer and are explicitly out of scope for this
migration.
## Services (`packages/services/src/notification/`)
- `createNotification` — enforces plan limits on both the channel count
(`notification-channels`) and the provider itself (`sms` / `pagerduty`
/ `opsgenie` / `grafana-oncall` / `whatsapp` require plan flags);
validates the loose `data` payload against `NotificationDataSchema`;
validates monitor ids are in-workspace and not soft-deleted.
- `updateNotification` — replaces name / data / monitor associations in
a single transaction with the same validation rules as create.
- `deleteNotification` — hard-delete (FK cascade clears associations).
The tRPC wrapper swallows `NotFoundError` to preserve the old
idempotent behaviour.
- `listNotifications`, `getNotification` — batched IN query enriches
monitors per notification. Monitor enrichment is workspace-scoped and
filters soft-deleted monitors for defence-in-depth.
- All mutations run inside `withTransaction`, emit `emitAudit`.
## Surface migrated
- **tRPC** (`packages/api/src/router/notification.ts`): `list` / `new` /
`updateNotifier` / `delete` become thin service wrappers. `sendTest`
+ `createTelegramToken` + `getTelegramUpdates` are unchanged.
## Out of scope (flagged for follow-up)
- **`sendTest` migration** — the dispatch switch imports from 10
`@openstatus/notification-*` packages. Moving it into services would
pull those as direct deps; the plan's phrasing ("Channel CRUD +
test-dispatch") allows this as a later extraction.
- **`createTelegramToken` / `getTelegramUpdates`** — redis + external
Telegram API helpers; transport UX, not domain operations.
- **Biome scope for `notification.ts`** — the file still imports
`@openstatus/db/src/schema` for the `sendTest` provider data schemas.
Will land with the sendTest migration follow-up.
- **Connect RPC notification handler** — stays on its own helpers;
follow-up aligned with PR 4's Connect deferral.
## Tests
- `__tests__/notification.test.ts` covers create (including
`ValidationError` on malformed data, `LimitExceededError` on gated
provider, `ForbiddenError` on cross-workspace monitor), update
(association replacement, cross-workspace `NotFoundError`), delete,
list/get workspace isolation + monitor enrichment scope.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(services/notification): address Cubic review
- **Update flow plan gate** (update.ts) — `updateNotification` was
skipping `assertProviderAllowed`, so a user who downgraded their plan
could still edit a notification configured with a now-restricted
provider. Re-check against the stored `existing.provider` to match
the create-time gate.
- **Provider / data match** (internal.ts) — `NotificationDataSchema` is
a union, so `{ provider: "discord", data: { slack: "…" } }` passed
the union check even though the payload key doesn't match the
provider. `validateNotificationData` now takes the provider and
asserts `provider in data` after the top-level parse. Applied in
both `create` and `update` (update uses the stored provider since
the API doesn't allow provider changes).
Added a test for the mismatch case.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(services/notification): tighten data validation against provider schema
Cubic's follow-up on the previous fix was right: checking only
`provider in data` isn't enough. `NotificationDataSchema` is a union,
so a payload like `{ discord: "not-a-url", slack: "valid-url" }` passes
because the union matches the slack variant — the extra `discord` key
is ignored, and my key-presence check sees `"discord"` and lets it
through.
Replaced the union parse + key check with a provider-specific schema
lookup (`providerDataSchemas[provider].safeParse(data)`). Each
canonical channel schema is keyed by its provider name and validates
the shape / content of the value, so the new check catches both the
mismatched-provider and malformed-payload cases in one pass.
Added a test covering the exact case Cubic flagged — invalid `discord`
URL alongside a valid `slack` URL now rejects with ValidationError.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(services): migrate page-component domain (PR 6/N) (#2107)
* feat(services): migrate page-component domain onto service layer
Sixth domain, tRPC-only. Migrates `list`, `delete`, and `updateOrder`
onto `@openstatus/services/page-component`.
## Services (`packages/services/src/page-component/`)
- `listPageComponents` — workspace-scoped filter + optional pageId
filter. Batched enrichment in four IN queries (monitors, groups,
status reports via join, maintenances via join). All relation queries
scoped to the caller's workspace for defence-in-depth.
- `deletePageComponent` — hard-delete. Cascade clears the
`status_report_to_page_component` / `maintenance_to_page_component`
associations. The tRPC wrapper swallows `NotFoundError` to preserve
the pre-migration idempotent behaviour.
- `updatePageComponentOrder` — the complex one. Mirrors the existing
diff-and-reconcile pass faithfully (≈220 lines → a single transaction):
1. Assert the page is in the workspace.
2. Enforce the workspace's `page-components` plan cap.
3. Validate every monitor id in the input set.
4. Remove monitor components whose monitorId isn't in the input;
remove static components based on whether the input carries ids.
5. Clear `groupId` before dropping groups (FK safety), then recreate
groups.
6. Upsert monitor components via `onConflictDoUpdate` on the
`(pageId, monitorId)` unique constraint (preserves ids).
7. Update existing static components by id; insert new ones.
Audit: `page_component.update_order` / `page_component.delete`.
## Surface
- **tRPC** (`packages/api/src/router/pageComponent.ts`): all three
procedures call services. `delete` catches `NotFoundError` and
returns the old `drizzle.returning()`-shaped empty array. The
pre-existing `pageComponent.test.ts` (tests cross-workspace monitorId
→ `TRPCError(FORBIDDEN)`) is untouched and still valid — my services
throw `ForbiddenError`, which `toTRPCError` maps to the same code.
## Enforcement
- Biome `noRestrictedImports` scope adds
`packages/api/src/router/pageComponent.ts`.
- Subpath export `@openstatus/services/page-component`.
## Tests
- `__tests__/page-component.test.ts` covers `updatePageComponentOrder`
happy path (creates monitor + static + grouped components), rejects
cross-workspace monitorId and cross-workspace pageId, `list`
workspace isolation, `delete` cross-workspace `NotFoundError`.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(services): Connect RPC notification handler catch-up (#2108)
* feat(services): Connect RPC notification handler onto services (catch-up)
Follow-up to PR 5 — noticed on review that my PRs from PR 4 onwards had
been narrowing scope to tRPC only and deferring Connect handlers, which
was piling up. This closes the notification Connect gap.
## What changed
`apps/server/src/routes/rpc/services/notification/index.ts` — the five
CRUD methods now delegate to `@openstatus/services/notification`:
- `createNotification` → `createNotification` service (handles the
plan-count limit, per-provider plan gate, and data-schema validation
internally — the Connect-side `checkNotificationLimit` /
`checkProviderAllowed` / `validateProviderDataConsistency` calls are
gone).
- `getNotification`, `listNotifications`, `updateNotification`,
`deleteNotification` — thin proto-to-service-to-proto wrappers.
- `updateNotification` reads the existing record via the service and
fills in missing fields (Connect's update is partial; the service
expects a full payload), then applies the update.
Left inline:
- `sendTestNotification` — calls `test-providers.ts` (external HTTP).
- `checkNotificationLimit` RPC method — returns the count info via
`./limits.ts` helpers (pure queries, no domain mutation).
The local Connect helpers (`validateProviderDataConsistency`,
`checkNotificationLimit`, `checkProviderAllowed`, and the ad-hoc
`validateMonitorIds` / `updateMonitorAssociations` / `getMonitorById` /
`getMonitorCountForNotification` / `getMonitorIdsForNotification`) are
no longer imported by `index.ts`; they remain in their files because
`test-providers.ts` and the unmigrated Connect monitor handler still
reference some of them.
## Biome
Added `apps/server/src/routes/rpc/services/notification/index.ts` to
the `noRestrictedImports` scope. The directory-level glob isn't a fit
because `limits.ts` and `test-providers.ts` legitimately need direct
db access until their own follow-up migrations.
## Deferred
- **Connect monitor handler** (~880 lines, 6 jobType-specific
create/update methods + 3 external-integration methods) — requires a
much bigger refactor. Flagged as dedicated PR 4b; tracked separately.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(services): dedupe monitor ids in Connect createNotification response
Cubic's P2 catch: the service dedupes `monitors` before the insert
(via `validateMonitorIds` in the services package), but the Connect
handler echoed `req.monitorIds` verbatim back in the response. For an
input like `["1", "1", "2"]` the DB stored `[1, 2]` while the response
claimed `["1", "1", "2"]` — caller state diverges from persistence.
Echo `Array.from(new Set(req.monitorIds))` instead so the response
matches what's actually stored.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(services): migrate page authoring (PR 7/N) (#2109)
* feat(services): migrate page (status-page authoring) onto service layer
Seventh domain. Migrates the 13 authoring procedures in
`pageRouter` onto `@openstatus/services/page`. Deliberately scoped to
authoring CRUD only:
- `statusPage.ts` — public viewer endpoints (subscribe / get / uptime /
report / verify / unsubscribe) are a separate surface that doesn't
use the authenticated `ServiceContext`; dedicated follow-up.
- Connect `apps/server/src/routes/rpc/services/status-page/**` — ~1500
lines with 18 methods (page CRUD + components + groups + subscribers
+ view). Too big for this PR; dedicated follow-up, same shape as the
Connect monitor deferral.
## Services (`packages/services/src/page/`)
- `createPage` / `newPage` — full vs minimal create; both enforce the
`status-pages` plan cap and (for `createPage`) the per-access-type
plan gates (password-protection, email-domain-protection, ip-
restriction, no-index).
- `deletePage` — FK cascade clears components / groups / reports /
subscribers.
- `listPages` — batched enrichment with `statusReports`.
- `getPage` — enriched with `maintenances` / `pageComponents` /
`pageComponentGroups`.
- `getSlugAvailable` — pure check against `subdomainSafeList` + DB.
- `updatePageGeneral` — with slug-uniqueness re-check on change.
- `updatePageCustomDomain` — persists the DB change and returns the
previous domain so the caller can diff. Vercel add/remove stays at
the tRPC layer (external integration).
- `updatePagePasswordProtection` — re-applies the same plan gates
the `create` path uses.
- `updatePageAppearance`, `updatePageLinks`, `updatePageLocales`
(gated on `i18n` plan flag), `updatePageConfiguration`.
- Audit action emitted for every mutation.
## tRPC (`packages/api/src/router/page.ts`)
All 13 procedures are thin wrappers. `delete` catches `NotFoundError`
for idempotency. `updateCustomDomain` orchestrates:
1. `getPage` (via service) to read the existing domain.
2. `addDomainToVercel` / `removeDomainFromVercel` as needed.
3. `updatePageCustomDomain` (via service) to persist.
## Enforcement
- Biome scope adds `packages/api/src/router/page.ts`. The router
imports `insertPageSchema` via the services re-export
(`CreatePageInput`) so the db-import ban applies cleanly.
- Subpath export `@openstatus/services/page`.
## Tests
- `__tests__/page.test.ts` covers `newPage` happy / reserved /
duplicate, `createPage` monitor attachment + cross-workspace monitor,
`updatePageGeneral` rename + duplicate-slug conflict + cross-workspace,
`updatePageLocales` plan gate, list / get / slug-available workspace
isolation, delete cross-workspace NotFoundError.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(services): migrate Connect status-page page CRUD onto services
Extends PR #2109 to cover the Connect RPC status-page handler's page
CRUD surface (create / get / list / update / delete), matching the
migration that landed for tRPC's `pageRouter`. The other 13 methods
(components, groups, subscribers, viewer) still read the db directly —
they're separate domains that'll need their own services in follow-ups.
- create / get / delete call into `@openstatus/services/page` and
preserve the granular Connect errors (`statusPageNotFoundError`,
`slugAlreadyExistsError`) by pre-checking before the service call or
catching `NotFoundError` → re-throwing the richer variant.
- list fetches via the service and paginates in-memory; status-page
quota is bounded per workspace so the extra enrichment is negligible.
- update loads the existing page via the service, then orchestrates the
per-section updates (`updatePageGeneral`, `updatePageLinks`,
`updatePageAppearance`, `updatePageCustomDomain`, `updatePageLocales`,
`updatePagePasswordProtection`) inside a shared transaction so a
partial failure can't leave the page half-updated. Each service's
internal `withTransaction` detects the pre-opened tx and skips
nesting.
- Proto-specific format validations (https icon URL, custom-domain
regex, IPv4 CIDR, email-domain shape) and the i18n PermissionDenied
path stay at the handler — they don't exist in the zod insert schema
and their error codes would change if deferred to the service.
- `Page` from the service parses `authEmailDomains` / `allowedIpRanges`
into arrays, while the converters (still used by the unmigrated
methods) expect the comma-joined string form. `serviceToConverterPage`
bridges the two shapes at the call sites that need it.
Biome scope deliberately unchanged: the file still imports from
`@openstatus/db` for the 13 legacy methods, so the override would
light up the whole file.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(services/page): address Cubic review on #2109
Four issues flagged across two Cubic reviews:
- `createPage` skipped `assertSlugAvailable`, so full-form creates
could bypass reserved/duplicate slug validation and either create a
duplicate or fail late on a DB constraint instead of the clean
`ConflictError`. Added the check alongside the existing quota gate.
- `createPage` passed `passwordProtected` / `allowedIpRanges` but not
`allowIndex` to `assertAccessTypeAllowed`, bypassing the `no-index`
plan gate on create. Now forwarded.
- `UpdatePagePasswordProtectionInput.allowedIpRanges` accepted arbitrary
strings. Mirrored the CIDR validation from `insertPageSchema` — bare
IPs get `/32` appended, everything pipes through `z.cidrv4()`.
- `updatePagePasswordProtection` wrote `authEmailDomains:
input.authEmailDomains?.join(",")`, which evaluates to `undefined`
when the caller clears the field. Drizzle treats `undefined` as
"skip this column" on `.set()`, so stale email domains survived an
access-type switch. Added the `?? null` fallback to match the
neighboring `allowedIpRanges` line. This fixes the Connect
`updateStatusPage` path where switching away from AUTHENTICATED sets
`nextAuthEmailDomains = undefined` expecting the column to clear.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(services): migrate workspace / user / invitation / api-key (PR 8/N) (#2110)
* feat(services): migrate workspace / user / invitation / api-key (PR 8/N)
Stacked on PR #2109. Eighth migration — four small domains consolidated into
one PR because each is narrow (roughly two to five procedures) and they
share no structural dependencies beyond already-migrated infrastructure.
### Services (`packages/services/src/{workspace,user,invitation,api-key}/`)
**workspace** — `getWorkspace`, `getWorkspaceWithUsage` (pages + monitors +
notifications + page-components batched via drizzle relations),
`listWorkspaces` (takes `userId` explicitly since `list` runs across every
workspace the user has access to), `updateWorkspaceName`.
**user** — `getUser` (active, non-soft-deleted), `deleteAccount` (the paid-
plan guardrail stays; removes non-owned memberships, sessions, OAuth
accounts and blanks the PII columns inside a single tx).
**invitation** — `createInvitation` (plan gate counts pending invites
against the members cap so two outstanding invites can't both accept past
the limit), `deleteInvitation`, `listInvitations`, `getInvitationByToken`
(scoped by token **and** accepting email to prevent token-sharing),
`acceptInvitation` (stamps acceptedAt + inserts membership atomically).
**api-key** — `createApiKey` (returns plaintext token once), `revokeApiKey`
(workspace-scoped existence check inside the tx so concurrent revokes
resolve to a consistent NotFound rather than a silent no-op),
`listApiKeys` (replaces the legacy per-row `Promise.all` fan-out with a
single IN query for creator enrichment), `verifyApiKey` +
`updateApiKeyLastUsed` (no ctx required — the verify path runs before
workspace resolution and callers pass an optional `db` override).
### tRPC (`packages/api/src/router/{workspace,user,invitation,apiKey}.ts`)
All 14 procedures become thin `try { return await serviceFn(...) } catch
{ toTRPCError }` wrappers. Router shapes stay identical so the dashboard
needs no changes. Connect + Slack don't expose these domains today;
migrating their consumers is a follow-up.
### Enforcement
Biome `noRestrictedImports` override adds the four router files. Subpath
exports `@openstatus/services/{workspace,user,invitation,api-key}` added
to the services package.
### Cleanup
Deletes `packages/api/src/service/apiKey.ts` and its tests — fully
superseded by `packages/services/src/api-key/`. The auth middleware in
`apps/server` has its own inline apiKey verification and is unaffected.
### Deliberately out of scope
- **`domain.ts`** — pure Vercel-API proxy with no DB usage; not part of
the migration surface. Stays as-is.
- **`packages/api/src/service/{import,telegram-updates}.ts`** — import
migration is PR 9; telegram-updates stays for a follow-up.
### Tests
Per-domain `__tests__/*.test.ts` covers: workspace rename + audit, usage
counts, members cap hit on free plan, invitation token-mismatch rejection,
accept idempotency, api-key creation returning a bcrypt hash, list creator
enrichment, revoke NotFoundError on unknown ids, verifyApiKey happy / bad-
format / wrong-body paths, lastUsed debounce.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(services): address Cubic review on #2110
Four issues flagged on PR 8:
- **P1 — `invitation/accept.ts`**: the read-then-write pattern let two
concurrent accepts both pass the `isNull(acceptedAt)` check and race
through the membership insert. Replaced with a conditional UPDATE that
re-asserts `isNull(acceptedAt)` in the WHERE clause and checks
`.returning()` rowcount. The loser gets `ConflictError`, the tx aborts
before membership inserts run.
- **P2 — `api-key/create.ts`**: `createdById` was taken from input and
the router spliced in `ctx.user.id`. Since that column is attribution
data (who owns the key, who the audit row blames), trusting input
would let any caller forge ownership. Derived from `ctx.actor` via
`tryGetActorUserId`; actors without a resolvable user id (system /
webhook / unlinked api-key) now get `UnauthorizedError` instead of a
silent NULL write. `createdById` removed from the input schema.
- **P2 — `invitation/delete.ts`**: audit row was emitted even when the
DELETE matched zero rows (unknown id / wrong workspace). Switched to
`.returning({ id })` and short-circuit before the audit emit so the
log only reflects actual deletions.
- **P2 — `invitation/list.ts`**: the `if (!input.email)` →
`UnauthorizedError` branch in `getInvitationByToken` was unreachable
because `z.email()` already rejects empty / malformed emails at
`.parse()`. Removed the dead branch; the router keeps its own
pre-call check for `ctx.user.email`, so the transport-level
UnauthorizedError path is preserved.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(services): migrate import domain (PR 9/N) (#2111)
* feat(services): migrate import domain (PR 9/N)
Stacked on PR #2110. Ninth and final domain — lifts the ~1,000-line
`packages/api/src/service/import.ts` orchestrator into the services
package as its own `@openstatus/services/import` domain.
### Services (`packages/services/src/import/`)
Split into focused files:
- **`schemas.ts`** — `PreviewImportInput` / `RunImportInput` zod. Provider
discriminator + per-provider page-id fields live here; options schema
is separately exported for callers that want to pre-validate.
- **`provider.ts`** — `createProvider` factory + `buildProviderConfig`
reshape helper, isolated from the orchestrator so adding a provider is
a one-file change.
- **`limits.ts`** — `addLimitWarnings` (shared by preview + run). Pure
mutation on the `ImportSummary` argument; no writes.
- **`utils.ts`** — `clampPeriodicity` + `computePhaseStatus` helpers.
- **`phase-writers.ts`** — the seven phase writers (page / component
groups / components / incidents / maintenances / monitors /
subscribers). Each takes a `DB` explicitly so callers can thread a
pre-opened tx; failing resources get `status: "failed"` with an error
string rather than throwing.
- **`preview.ts`** — dry-run only; validates credentials, runs the
provider with `dryRun: true`, emits warnings.
- **`run.ts`** — the orchestrator. Now owns the `pageId` ownership
check (previously duplicated in the tRPC router) and emits exactly
**one** `import.run` audit row regardless of outcome so partial /
failed runs still show up in the audit signal. Deliberately *not*
wrapped in `withTransaction` — imports can span minutes across dozens
of writes and the existing UX is phase-level recovery.
### tRPC (`packages/api/src/router/import.ts`)
124 lines → 28 lines. The router is now a thin `previewImport` /
`runImport` wrapper; the input schemas and all validation live in the
service. The router-level `TRPCError`-throwing `pageId` ownership check
moved into `runImport` so non-tRPC callers (Slack / future) get the
same guard.
### Error shape changes
- Provider validation failure: `TRPCError("BAD_REQUEST")` →
`ValidationError` → `TRPCError("BAD_REQUEST")`. Net-same.
- Unknown / wrong-workspace `pageId`: `TRPCError("NOT_FOUND")` →
`NotFoundError` → `TRPCError("NOT_FOUND")`. Net-same.
### Tests
- Unit tests for `addLimitWarnings` / `clampPeriodicity` /
`computePhaseStatus` move to `packages/services/src/import/__tests__/`.
- Router integration tests (`packages/api/src/router/import.test.ts`)
that previously called `previewImport` / `runImport` directly to
override workspace limits now route through `makeCaller(limitsOverride)`
with an explicit `provider: "statuspage"` field. This also fixes four
pre-existing TypeScript errors where those calls were missing the
(required) provider discriminator.
### Enforcement
- Biome `noRestrictedImports` override adds `packages/api/src/router/import.ts`.
- Subpath export `@openstatus/services/import` added.
- `@openstatus/importers` added to services deps; services `tsconfig.json`
bumped to `moduleResolution: "bundler"` so the importers package-exports
map resolves (same setting `packages/api` already uses).
### Cleanup
Deletes `packages/api/src/service/import.ts` (1042 lines) and its test
file (463 lines). Only `telegram-updates.ts` remains in
`packages/api/src/service/` — that's slated for a follow-up PR.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(services/import): per-resource audit + Cubic fixes on #2111
Two changes folded together:
### Per-resource audit
Every phase writer now emits one `emitAudit` row per *created*
resource, matching what the domain services emit for normal CRUD:
| Phase | Audit action
| --- | ---
| page | `page.create`
| componentGroups | `page_component_group.create`
| components | `page_component.create`
| monitors | `monitor.create`
| incidents | `status_report.create` + `status_report.add_update` per update
| maintenances | `maintenance.create`
| subscribers | `page_subscriber.create`
Skipped resources don't emit (their original create audit already
exists); failed resources don't emit (nothing was written); link-table
rows (statusReportsToPageComponents etc.) don't emit (edges, not
entities). Metadata always carries `source: "import"` + `provider:
<name>` + `sourceId: <provider-id>` so the audit trail traces back to
the source system.
The rollup `import.run` audit still fires at the end — the per-resource
rows give forensic granularity, the run-level row gives "this bulk
operation happened" without scanning the full summary blob.
For the change, phase writers now take a shared `PhaseContext = { ctx,
tx, provider }` instead of `(db, workspaceId, limits)` — the orchestrator
builds one `PhaseContext` per run and threads it through, giving each
writer access to `ctx.actor` for audit attribution. `statusReportUpdate`
writes now use `.returning({ id })` so the per-update audit can
attribute the right row.
### Cubic review fixes
- **`run.ts:130`** — phases after `page` kept their provider-assigned
status when `targetPageId` was falsy but the user option wasn't
`false`. Replaced the narrow `else if (option === false)` branches
with a plain `else → phase.status = "skipped"`, matching what
`subscribers` already did.
- **`run.ts:147`** — when the `components` phase hit `remaining <= 0`,
the phase was marked `"failed"` but individual resource statuses were
left stale with no error string. Each resource is now marked
`"skipped"` with `"Skipped: component limit reached (N)"`, matching
`writeMonitorsPhase`. Phase-level status becomes `"skipped"` too
(was `"failed"` — failed implied a writer error, this is really a
plan-limit pre-check).
- **`provider.ts`** — both `createProvider` and `buildProviderConfig`
had a `default:` that silently ran the Statuspage adapter for any
unknown provider name, which would mask a typo by handing a non-
Statuspage api key to the wrong adapter. Replaced with exhaustive
`case "statuspage"` + `never`-typed default throw.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* chore(services): rename rpc/services → rpc/handlers (PR 10/N) (#2112)
The symbolic deliverable from the plan's "close the loop" PR. Renames
`apps/server/src/routes/rpc/services/` → `apps/server/src/routes/rpc/handlers/`
so the distinction between "the services layer" (owns business logic,
lives in `packages/services`) and "Connect transport handlers" (thin
proto → service → proto wrappers) is permanent and visible in the path.
Keeping the old name invites the next developer to "just add one small
thing" to a file under a `services/` folder months later; the rename
makes the layering explicit.
### Changes
- `git mv` of the six domain subdirectories + their tests
(health / maintenance / monitor / notification / status-page /
status-report).
- `router.ts` import paths updated from `./services/*` to `./handlers/*`.
- Biome `overrides.include` paths updated to the new location.
- Added `apps/server/src/routes/rpc/handlers/health/**` to the scope —
the health handler has no db usage today; including it locks in that
invariant.
### Still out of scope (follow-ups)
Rather than pretending the full "close the loop" deliverable is possible
today, the biome.jsonc comment now enumerates exactly what remains
unmigrated:
- `packages/api/src/router/statusPage.ts` — public viewer endpoints
under `publicProcedure`, no authed `ServiceContext`.
- `packages/api/src/router/{member,integration,monitorTag,
pageSubscriber,privateLocation,checker,feedback,stripe,tinybird,
email}.ts` — small domains not yet lifted.
- `apps/server/src/routes/rpc/handlers/monitor/**` — 6 jobType-specific
methods still on db.
- `apps/server/src/routes/rpc/handlers/status-page/**` — page CRUD is
migrated (PR 7), but components / groups / subscribers / viewer (13
methods) still import db, so the whole file stays out of scope.
- `apps/server/src/routes/v1/**` — the public HTTP API surface.
- `apps/server/src/routes/slack/**` except `interactions.ts` — tools,
handler, oauth, workspace-resolver still on db.
- `apps/server/src/routes/public/**` — public-facing HTTP routes.
Each of the above is its own PR-sized migration. The final consolidation
(broadening to `router/**` + dropping `@openstatus/db` from
`packages/api` and `apps/server`) is conditional on all of them
landing first.
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(services/import): use ctx workspaceId for page insert
`writePagePhase` was inserting with `data.workspaceId` — the value the
provider package round-tripped into resource data. Every other phase
writer (monitor / components / subscriber) already reads `workspaceId`
from `ctx.workspace.id`; this lines the page insert up with that
pattern. Defends against the (unlikely) case where a provider mapper
serialises the wrong workspace id into its output, since `ctx` is the
authoritative source.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(services): address Claude review findings on #2110
Six findings from Claude's review pass — five code/doc fixes, one
documentation-only note.
**P2 — `acceptInvitation` derives userId from `ctx.actor`.**
Was taking it from input: the email scoped *which* invitation could
be accepted, but not *who* the membership was inserted for. A caller
with the right token+email could insert a membership under an
arbitrary user id. Removed `userId` from `AcceptInvitationInput`;
derived from `tryGetActorUserId(ctx.actor)`, throws
`UnauthorizedError` for non-user actors. Mirrors the same pattern
applied to `createApiKey.createdById` in the Cubic pass. Router and
test updated accordingly.
**P2 — `getWorkspace` throws `NotFoundError` explicitly.**
`findFirst` + `selectWorkspaceSchema.parse(undefined)` was throwing
`ZodError` (→ `BAD_REQUEST`) instead of the `NotFoundError` shape
every other service uses. Unreachable in practice (ctx.workspace is
resolved upstream) but the error shape was the only outlier;
consistency matters for callers pattern-matching on error codes.
**P3 — `listApiKeys` filters null `createdById` before the IN query.**
The new `createApiKey` path enforces a non-null creator, but legacy
rows may have null. SQL's `x IN (NULL)` is `UNKNOWN` — technically
safe — but drizzle types model the array as `number[]`. Filtering
upfront keeps the types honest and sidesteps any future surprise.
**P3 — `deleteInvitation` guards `acceptedAt IS NULL`.**
The WHERE previously allowed hard-deleting *accepted* invitations,
wiping the "user was invited on X" breadcrumb. Added the
`isNull(acceptedAt)` guard + doc comment explaining the audit-trail
preservation intent.
**Doc-only — `deleteAccount` orphan comment.**
Non-owner memberships are removed, but owner memberships + owned
workspaces survive. Matches legacy behavior. Added a scope-note
docblock flagging that workspace cleanup is explicitly out of scope
(belongs to a future admin / scheduled job).
**Doc-only — `createInvitation` role comment.**
The invite insert lets `role` fall through to the schema default
(`member`). Matches legacy (which also only picked `email`).
Comment added so the absence reads as deliberate rather than
overlooked.
Minor — the concurrent-accept race test is covered by the conditional
UPDATE + `ConflictError` path from the earlier P1 fix; mocking it
reliably against SQLite is noisy and not worth the test complexity.
Documented in the related code comment.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(services): address Claude re-review findings on #2110
Four issues surfaced after the first round of fixes on this PR:
**P2 — `listApiKeys` crashes on all-legacy keys.**
After the null filter added in the previous commit, workspaces whose
keys all pre-date the services migration (every `createdById` null)
end up with `creatorIds === []`. Drizzle throws "At least one value
must be provided" on an empty `inArray`, taking the whole endpoint
down. Added an early return that maps `createdBy: undefined` when
there are no non-null creator ids to look up.
**P2 — `getWorkspaceWithUsage` ZodError on missing row.**
Same `findFirst` + `selectWorkspaceSchema.parse(result)` pattern as
`getWorkspace`, but without the `NotFoundError` guard that got added
in the earlier pass. Added the guard. Also cleaned up the usage
block — no longer needs optional chaining once the narrowing fires.
**P2 — `deleteAccount` took `userId` from input.**
Completing the `createApiKey` / `acceptInvitation` pattern: account
deletion must target `ctx.actor`, never an arbitrary id. Dropped
`userId` from `DeleteAccountInput` (now an empty forward-compat
shape), derived inside the service via `tryGetActorUserId`, throws
`UnauthorizedError` for non-user actors. Router updated to stop
passing it.
**P3 — `createInvitation` dev-token log could leak in tests.**
Tightened the comment around the `process.env.NODE_ENV === "development"`
guard to flag that strict equality is load-bearing — bun:test sets
`NODE_ENV=test` and CI leaves it undefined, both of which correctly
skip the log. No behavior change, just a clearer contract so the
next reader doesn't loosen it.
Cubic's two findings on this review pass point at `packages/api/src/
router/import.ts` and `packages/services/src/import/limits.ts` — both
live in the next PR up the stack (#2111 / feat/services-import) and
will be addressed there.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(services): address latest Cubic pass on #2110
Four findings from the third Cubic review (now that #2111's import
domain is included in the #2110 diff via the stack):
**P2 — biome.jsonc notification handler scope.**
Only `notification/index.ts` was in the `noRestrictedImports` override.
Sibling files (`errors.ts`, `test-providers.ts`) were outside the
migration guard, so new db imports could land in them without the
lint failing. Broadened to `notification/**` and moved the two files
that *legitimately* still read db (`limits.ts` querying workspace
quotas, `converters.ts` needing db enum shapes for proto round-trip)
into the `ignore` list. Future siblings are enforced by default
rather than silently slipping through.
**P2 — `clampPeriodicity` unknown values returned too fast.**
`PERIODICITY_ORDER.indexOf("unknown") === -1` → `Math.max(-1, 0) === 0`
→ walk started at `"30s"` (the fastest tier). Could return an
interval faster than requested, violating the
"never-faster-than-requested" invariant. Short-circuits now to the
slowest allowed tier when the requested value isn't a known
periodicity. Added unit tests covering the unknown-value + empty-
allowed fallback paths.
**P2 — component/monitor limit warnings counted total resources, not
quota-consuming inserts.**
If the import contained 4 components and 3 already existed (would be
skipped as duplicates), the warning claimed `"Only X of 4 can be
imported"` — but actually zero quota would be consumed by the 3
skips, so the real new-creation count might fit entirely. Reworded
to `"Only N new components may be created … some of the M in the
import may already exist and be skipped"`. Same treatment for the
monitors warning. Preview stays DB-light (no per-resource existence
checks); the warning now honestly conveys worst-case without
misleading users about what will actually happen. Test assertions
updated to match the new wording with substring matches that aren't
tied to the exact fraction.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(services/page): address Claude review on #2109
Six items from Claude's review, going with the calls I leaned toward
in the question-back:
**P2 — tRPC `updateCustomDomain` wasteful `getPage` read.**
Was calling `getPage(id)` (fires 3 batched relation queries:
maintenances + components + groups) just to grab `customDomain`
before the Vercel add/remove calls. Added a narrow
`getPageCustomDomain` service helper — single indexed lookup,
workspace-scoped, returns the string directly. Router swapped over.
Service-layer authority preserved; no db reads leak into the router.
**P2 — Connect `updateStatusPage` slug-race code drift.**
Handler pre-checks slug to surface `slugAlreadyExistsError`
(`Code.AlreadyExists`). The `updatePageGeneral` service call
re-validates via `assertSlugAvailable` → `ConflictError` →
`Code.InvalidArgument` in the race where two callers both clear the
pre-check. Wrapped the call in `try/catch (ConflictError)` and
rethrow as `slugAlreadyExistsError(req.slug)` so gRPC clients keying
on the code get a consistent `AlreadyExists` whether they lose at
the pre-check or at the inner tx.
**P2 — Connect `createStatusPage` / `updateStatusPage` customDomain
without Vercel sync.** Pre-existing behaviour (the direct-db handler
had the same gap). Added a top-of-impl comment so it doesn't go
unnoticed — the fix is a shared transport-layer helper the Connect
handlers can reuse, out of scope for this migration PR to keep the
behavioural blast radius small for external API consumers.
**P3 — double cast `row as unknown as Page` in `create.ts`.**
The drizzle insert-returning type and the `Page` type diverge on
`authEmailDomains` / `allowedIpRanges` (raw comma-joined string vs
parsed `string[]`). Replaced the double casts with
`selectPageSchema.parse(row)` which normalises the row into the shape
callers expect. Cast-drift is now impossible to introduce silently.
**P3 — `void ConflictError;` workaround.**
Import was unused in `create.ts`; the `void` line was silencing the
unused-import warning rather than fixing the cause. Removed both.
**P3 — deprecated `passwordProtected` column.**
Added a doc block on `updatePagePasswordProtection` flagging that
the deprecated boolean column is intentionally not written here (the
v1 REST read path derives it from `accessType` via
`normalizePasswordProtected`). Prevents a future reader from
mistaking the omission for an oversight and writing two sources of
truth for the same signal.
Test coverage for the 5 untested update services
(`updatePagePasswordProtection`, `updatePageCustomDomain`,
`updatePageAppearance`, `updatePageLinks`, `updatePageConfiguration`)
deferred to a follow-up per Claude's "not blocking" marker — the
failing-edge behaviour is the critical bit, and
`updatePagePasswordProtection` already has indirect coverage through
the Connect handler tests on this branch.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(services): address Claude review on #2108
Four items from Claude's review of the Connect notification handler
backfill:
**P3 — `protoDataToServiceInput` swallowed parse failures.**
`try { JSON.parse } catch { return {} }` was hiding any malformed
output from `protoDataToDb` (which would be a programmer error, not
user-input) behind a generic empty-object fallback. The downstream
`validateNotificationData` then failed with a far less specific
error. Let the throw propagate — `toConnectError` maps it to
`Code.Internal`, which is the signal we want for "the helper itself
misbehaved."
**P3 — `createNotification` response approximated the monitor IDs.**
Was echoing `Array.from(new Set(req.monitorIds))` on the happy path
(correct, since the service validates + throws on invalid) but the
approximation diverged from `updateNotification`'s re-fetch pattern.
Now re-fetches via `getNotification` after create so the response
reflects what's actually in the DB — one extra IN query per create,
eliminates the approximation entirely, makes both handlers
structurally identical.
**P3 — `sendTestNotification` bypassed `toConnectError`.**
Only handler in the impl without a `try { … } catch { toConnectError
}` wrap, so any thrown `ServiceError` / `ZodError` from
`test-providers.ts` fell through to the interceptor's generic catch
and surfaced with a less precise gRPC status. Wrapped for symmetry.
**P3 — `JSON.parse(existing.data)` null-unsafe.**
Drizzle infers `notification.data` as `string | null` (the column has
`default("{}")` but no `.notNull()`). A legacy row with `NULL` in the
column would crash `updateNotification` with `SyntaxError` during
the partial-update read-modify-write. Added `?? "{}"` fallback and a
comment pointing at the schema.
Cubic's single finding from the earlier pass (dedupe of
`req.monitorIds` in the create response) was already applied in
`b69ad13` and has now been superseded by the re-fetch above.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(services): address latest Cubic pass on #2108
Five findings from Cubic's second review cycle on this PR, all on
files that entered this branch via the #2109 (status-page) and #2111
(import) squash-merges stacked on top. Fixing here so the cumulative
state reaching main is clean.
**P1 — `page/create.ts` double-encoded JSON configuration.**
`page.configuration` is a drizzle `text("…", { mode: "json" })`
column — drizzle serialises objects automatically. Calling
`JSON.stringify(configuration)` first stored a raw JSON string in
the column, breaking any downstream read that expects an object
(e.g. the appearance merge at `update.ts:185`). Dropped the wrap;
drizzle handles it.
**P2 — `page/schemas.ts` slug + customDomain validation weaker than
insert schema.**
`NewPageInput.slug`, `GetSlugAvailableInput.slug`, and
`UpdatePageGeneralInput.slug` were `z.string().toLowerCase()` — no
regex, no min-length. `UpdatePageCustomDomainInput.customDomain`
was `z.string().toLowerCase()` — no format check. Meant the service
would accept malformed slugs / URLs that `createPage` would then
reject via `insertPageSchema`, or — worse — that `getSlugAvailable`
would confidently return "available" for garbage. Exported the
canonical `slugSchema` + `customDomainSchema` from
`@openstatus/db/src/schema/pages/validation` and reused them across
all four service inputs; db validation is now the single source of
truth for page slug/domain shape.
**P2 — `api/router/import.ts` nullish → optional contract narrowing.**
The service's `PreviewImportInput`/`RunImportInput` used `.optional()`
for the three provider page-id fields, which dropped the `null`
acceptance the legacy router had via `.nullish()`. Existing clients
sending `null` would have started hitting `Invalid input` errors
after the import migration landed. Added a `nullishString` transform
in the service schema that accepts `string | null | undefined` and
normalises to `string | undefined` before it reaches
`buildProviderConfig` — callers keep the broader contract, service
internals stay ignorant of `null`.
**P2 — `page/update.ts` empty array stored "" not null.**
`authEmailDomains?.join(",") ?? null` coerces `null`/`undefined` to
`null`, but `[].join(",")` returns `""` (empty string) which `??`
treats as a value. Callers sending `authEmailDomains: []` to clear
the column were persisting the empty string instead of nulling it —
misleading "present but blank" state. Switched to `|| null` on both
array-join outputs (`authEmailDomains` + `allowedIpRanges`) so the
three clearing inputs — `undefined`, `null`, `[]` — all land on DB
`NULL` while real non-empty joins pass through unchanged.
Test fixtures already use slugs ≥ 3 chars that match the regex, so
the tightened validation doesn't break any existing assertions.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(services/page-component): address Cubic + Claude review on #2107
Three fixes + one test, addressing both the original Cubic finding
and Claude's re-review pass.
**P2 — Discriminated union for `componentInput`.**
The flat `z.object` with `type: z.enum(["monitor", "static"])` and
optional `monitorId` let callers submit a "monitor" component with
no monitor id, or a "static" one with a monitor id attached. The DB
catches it with a `CHECK` constraint, but that surfaces as an opaque
SQLite CHECK failure instead of a clean `ZodError` at the service
boundary. Replaced with a `z.discriminatedUnion("type", [...])` that
requires `monitorId` on the "monitor" arm and omits it on the
"static" arm.
Fallout in `update-order.ts`: `c.monitorId` no longer exists on the
"static" arm after narrowing, so the spreads now use
`monitorId: c.type === "monitor" ? c.monitorId : null`. The defensive
`&& c.monitorId` guards on the already-narrowed monitor branches are
gone (TypeScript enforces the invariant the DB was catching late).
**P2 — Sequential group insert instead of bulk `.returning()`.**
The bulk insert relied on drizzle/SQLite returning rows in the same
order they were inserted, so `newGroups[i]` could line up with
`input.groups[i]` when mapping components to their groups. True on
Turso today, but an implicit coupling — any driver change, batch
split, or upstream sort could silently reorder rows and land
components in the wrong group with no error signal. Switched to a
loop that captures each group id before moving on; the set size is
bounded by the status-page component-group plan cap so the extra
round trips are a rounding error.
**Nit — removed dead `hasStaticComponentsInInput` guard.**
Both the "input has static components but none carry ids" and "input
has no static components at all" branches collapsed to the same
"drop all existing static components" action, so the outer
`hasStaticComponentsInInput` conditional was doing no work. Dropped
the variable and the nested branch.
**Test — upsert idempotency.**
The `onConflictDoUpdate` on `(pageId, monitorId)` was the riskiest
untested path — a regression would silently insert duplicate rows on
every re-invocation. Added a test that calls
`updatePageComponentOrder` twice on the same page with the same
`monitorId`, then asserts there's exactly one matching row and the
second call's values won.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(services): address latest Cubic pass on #2107 + unblock build
Eight Cubic findings from the second review plus one dashboard build
break from my earlier discriminated-union change.
**Build — router shape diverged from service discriminated union.**
`packages/api/src/router/pageComponent.ts` kept its own flat
`z.object({...})` input schema with `type: z.enum(["monitor",
"static"])` and `monitorId: z.number().nullish()`. After the service
switched to `z.discriminatedUnion("type", [...])`, TS couldn't
reconcile the two — dashboard build failed. Replaced the local
schema with the service's exported `UpdatePageComponentOrderInput`
so both layers share the canonical shape.
**P1 — page router: validate customDomain before Vercel call.**
The router input was `z.string().toLowerCase()` (no format check) and
the service's `customDomainSchema` only fired inside
`updatePageCustomDomain`, *after* the Vercel add/remove mutations.
A malformed domain could be added to Vercel, then rejected by the
service, leaving Vercel/db state drifted. Switched the router input
to the service's `UpdatePageCustomDomainInput` so format validation
runs at tRPC input parsing, before any Vercel call.
**P1 — `listApiKeys` leaked `hashedToken`.**
`SELECT *` returned every column including the bcrypt hash of each
key's one-time token, which has no business appearing in a list
response. Replaced with an explicit column select that omits
`hashedToken`. New `PublicApiKey` type (`Omit<ApiKey,
"hashedToken">`) is the return shape; exported from the barrel.
**P2 — `acceptInvitation` eager workspace load + second fetch.**
The initial `findFirst` already loaded the workspace via `with: {
workspace: true }`, but the return value re-fetched it by id. Use
the joined value directly — one round-trip instead of two, and
eliminates the read-skew window where a just-renamed workspace
could appear with a different name in each fetch.
**P2 — `import.run` audit entityId 0.**
`entityId: targetPageId ?? 0` wrote a ghost `page 0` reference to
the audit trail when the page phase failed before producing an id.
Entity attribution now falls back to the workspace (`entityType:
"workspace"`, `entityId: ctx.workspace.id`) when no target page is
in play — real rollback signal, no phantom foreign key.
**P2 — `page-components` limit scoped per-page, not workspace.**
`page-components` is a workspace-wide cap (see
`page-component/update-order.ts` — counts every component across
every page). The import preview and run's component check were
scoping the existing count to `targetPageId`, which understated
pressure and would let imports push past the cap at write time.
Both sites now count workspace-wide.
**P2 — `writeIncidentsPhase` lacked idempotency.**
Every other phase writer checks for an existing row before
inserting (page by slug, monitor by url, component by name,
subscriber by email); `writeIncidentsPhase` inserted
unconditionally. A re-run would duplicate status reports on every
pass. Added an existence check by `(title, pageId, workspaceId)`
matching the convention.
**P2 — `writeMaintenancesPhase` lacked idempotency.**
Same pattern. Added a check by `(title, pageId, from, to,
workspaceId)` — the `from/to` pair is load-bearing because
maintenance titles recur ("DB upgrade") across unrelated windows.
**P2 — `writeComponentsPhase` silent monitor→static fallback.**
When the source monitor failed to resolve (e.g. `includeMonitors ===
false`), the component was silently degraded to `type = "static"`
and reported as `created` with no explanation. Other phase writers
populate `resource.error` on any degrade path. Added a matching
error string pointing at the source monitor id (or lack thereof) so
the summary conveys the degrade.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(services/page-component): revert discriminated union to flat + .refine
The previous commit switched `componentInput` in the service schema to
a `z.discriminatedUnion("type", [...])` to get a clean `ZodError` at
parse time for the monitor/static invariant. That produced a narrowed
TS shape (`type: "monitor"` → required `monitorId: number`) that every
caller had to match — including the dashboard form, where
react-hook-form can't model discriminated unions cleanly and would
have needed a flat→union adapter at submit time. The ripple was
user-visible frontend churn for a schema-layer concern.
Switched back to a flat `z.object` + cross-field `.refine` on
`(type, monitorId)`. Same parse-time rejection Cubic asked for
(ZodError with a specific path, not an opaque SQLite CHECK failure),
but the inferred TS type stays flat so callers — router input, RHF
form values — keep their existing shape.
Also restored the downstream `&& c.monitorId` guards and
`as number[]` casts in `update-order.ts`. With a flat schema, TS
still sees `monitorId: number | null | undefined` on the monitor
branch; the refine rejects violating input at parse time, but the
guard is needed to narrow for the type system. Matches the
pre-migration shape exactly.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(services/import): reconcile component links on idempotent skip
The idempotency checks in `writeIncidentsPhase` and
`writeMaintenancesPhase` added in the previous pass correctly avoid
duplicate status-report / maintenance rows on rerun, but `continue`-d
out of the writer before the component-link insertion block. The
failure mode this leaves open:
1. Run 1: component phase uses per-resource catch, so a single
component can fail and leave `componentIdMap` partial.
2. The report/maintenance is written with a subset of the intended
links — only the entries whose source id resolved in the map.
3. Run 2: the previously-failed component now succeeds and lands in
`componentIdMap`. The report/maintenance idempotency check hits,
`continue` fires, and the still-missing link is never written.
Both join tables (`statusReportsToPageComponents`,
`maintenancesToPageComponents`) have a composite primary key on
`(parentId, pageComponentId)`. Running the same link-build pass on
the skip path with `.onConflictDoNothing()` is a no-op for the links
already present and adds any that resolved this time round. Matches
the "reruns converge to correct state" model that motivated the
idempotency checks in the first place.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(services/notification): address Claude review pass
Three small findings from the latest Claude review, bundled:
**#1 — `list.ts` redundant `conditions` array.** `listNotifications`
initialised a one-element `SQL[]` and spread it into `and(...)` with
no second push site anywhere. Collapsed to a direct `eq(...)`. The
pattern is load-bearing in `monitor/list.ts` (two conditions: workspace
scope + soft-delete filter) but notifications have no soft-delete
column, so the indirection was pure noise.
**#2 — router `dataInputSchema` duplicated the service schema.**
`packages/api/src/router/notification.ts` hand-rolled a
`z.partialRecord` structurally identical to the `dataSchema` inside
`packages/services/src/notification/schemas.ts`. Drift hazard: a
future provider-value shape change in the service would be accepted
at the tRPC layer and fail with an opaque error deeper in
`validateNotificationData`. Renamed the service schema to
`NotificationDataInputSchema`, exported it from the service barrel,
and replaced the router's local copy + the now-unused
`servicesNotificationProvider` alias.
**#5 — `update.ts` audit missing `provider` metadata.**
`createNotification` attaches `metadata: { provider: input.provider
}` to its audit row; `updateNotification` didn't. The `provider`
column is recoverable from the `before`/`after` rows, but asymmetric
metadata breaks simple `action + metadata.provider` audit queries.
Added `metadata: { provider: existing.provider }` for parity.
Skipped the two non-fixes: the `enrichNotificationsBatch` drizzle-cast
fragility is the same pattern as `monitor/list.ts`, worth a codebase-
wide change rather than a single-domain carve-out; the `dataSchema`
being "intentionally wide" is already called out in the schema JSDoc
and is correct by design (provider/payload alignment is enforced at
the service boundary by `validateNotificationData`).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* test(services/notification): cover update audit + post-downgrade gate
Claude review noted two gaps in the `updateNotification` suite. Adding
both:
**`notification.update` audit row.** `createNotification` already
asserts an audit row fires with `expectAuditRow`; the update path was
silent. With the per-mutation audit contract (every write emits a
row), the update case needs an equivalent pin so a regression that
drops the emit site is caught. Note: the v1 audit buffer shape
(`AuditLogRecord`) doesn't carry `metadata`, so the `{ provider }`
payload can't be asserted directly here — that coverage lands with
the v2 audit-table move, called out in the test comment.
**Plan-gate after a downgrade.** The Cubic-flagged fix added
`assertProviderAllowed(existing.provider)` to `updateNotification`
so a previously-allowed channel becomes read-only once the workspace
drops to a plan that no longer includes it. The regression test
simulates the downgrade by directly inserting a `pagerduty` row into
the free workspace (bypasses the create-time gate) and then calls
`updateNotification` via `freeCtx` — asserts `LimitExceededError`.
Without the gate the update would silently succeed and leave a
channel on an unsupported plan.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(services): address Cubic P1/P2/P3 pass across the stack
Twelve findings from Cubic's 2026-04-24 review on #2106. Fixes are
bundled on this branch because the later-PR content has cascaded
down via the 2107 squash-merge — each item is called out by file so
the commit is easy to skim.
**P1 — `import/phase-writers.ts` subscribers reconciliation.**
`writeIncidentsPhase` and `writeMaintenancesPhase` already
reconcile component links on the idempotent-skip path.
`writeSubscribersPhase` didn't, so `pageSubscriberToPageComponent`
rows lost to a partial first run are never recovered. Same
`.onConflictDoNothing()` pattern, same composite-PK rationale.
**P1 — `page/schemas.ts:129` UpdatePageAppearance accepts any theme.**
`configuration.theme` was `z.string()`; the read path parses
configuration through `pageConfigurationSchema` which uses the
`THEME_KEYS` enum. Wrote-and-broke-read pairs are now impossible.
Router now reuses `UpdatePageAppearanceInput` from services so the
tRPC boundary enforces the same enum.
**P1 — `page/schemas.ts:160` UpdatePageConfiguration too permissive.**
Replaced the `z.record(z.string(), z.string()|z.boolean())` with
`pageConfigurationSchema.nullish()`, which is what the read path
already uses. Dashboard form's free-typed `configuration` values are
cast at the mutation call-site (tRPC input parse catches invalid
submits).
**P2 — `page-component/internal.ts:39` validateMonitorIds.**
Added `isNull(monitor.deletedAt)` so tombstoned monitor ids don't
pass as valid attach targets.
**P2 — `page-component/list.ts:71` enrichment returns deleted monitors.**
Same `isNull(monitor.deletedAt)` filter on the enrichment lookup so
`component.monitor` is never populated with a tombstoned row.
**P2 — `page-component/update-order.ts:277` stale ID set for statics.**
`existingComponentIds` was built from the pre-delete snapshot of
`existingComponents`. If an input static carried an id that matched
a monitor component just removed (because its `monitorId` dropped
out of the input set), the `has(id)` check sent the row to the
UPDATE branch, which silently no-op'd against the now-deleted id
and lost the new static. New set filters by `type === "static"` and
subtracts `removedComponentIds` so only surviving static ids take
the update path.
**P2 — `invitation/accept.ts:75` re-assert invitation expiry.**
Conditional UPDATE only re-checked `acceptedAt IS NULL`. An
invitation expiring between the initial read and the update could
still be claimed. Added `gte(expiresAt, now)` to the UPDATE predicate.
**P2 — `invitation/accept.ts:82` duplicate workspace membership.**
`.onConflictDoNothing()` on `usersToWorkspaces` insert keyed on the
`(userId, workspaceId)` composite PK. An already-member invitee
previously blew up the unique constraint *after* the invitation was
stamped accepted.
**P2 — `import/limits.ts` warnings ignore run options.**
`addLimitWarnings` now takes `options` and gates each per-phase
warning on the corresponding include flag. Threaded from both
`run.ts` and `preview.ts`; `PreviewImportInput` gained an `options`
field so previews and runs produce the same warning set.
`ImportOptions` moved above `PreviewImportInput` (forward ref).
**P2 — `user/delete.ts` + `service-adapter.ts` PRECONDITION_FAILED.**
Paid-workspace guard threw `ForbiddenError`, which `toTRPCError`
mapped to `FORBIDDEN`. The pre-migration router surfaced this as
`PRECONDITION_FAILED` so the UI could distinguish "missing
permissions" from "missing prerequisite state". Added a
`PreconditionFailedError` class + `PRECONDITION_FAILED` service
error code + matching tRPC mapping. Exported from the services
root barrel.
**P2 — `router/import.test.ts` limits override didn't stick.**
`enforceUserIsAuthed` replaced the test's `makeCaller`-provided
workspace with the seeded team plan on every call, so limit
assertions silently ran against team limits. Added a
`NODE_ENV==='test' && workspace!=null && user!=null` short-circuit
at the top of the middleware that trusts the pre-populated inner
context. `makeCaller` now also stamps `user` so the escape hatch
triggers. Test-only behaviour, production path unchanged.
**P3 — `invitation.test.ts` members-cap seed.**
Test assumed the free workspace had an owner already occupying the
`members: 1` slot, but the shared DB seed only adds a row for
workspace 1 (team). Seed a membership for the free workspace in
`beforeAll`; tear it down in `afterAll`.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(services): declare @openstatus/theme-store workspace dep
Follow-up to the Cubic P1 schema-tightening commit. `page/schemas.ts`
now imports `THEME_KEYS` / `ThemeKey` from `@openstatus/theme-store`
to enforce the canonical theme enum at the service input boundary,
but the dep wasn't declared in the services package.json. Next/
Turbopack module resolution blew up when apps that consume the
service (status-page build path) tried to evaluate the schema file.
Also added the same dep to `apps/server` — it depends on
`@openstatus/services` and needs the transitive module resolvable.
Dashboard / status-page already had it declared.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* chore(server): regenerate Dockerfile + lockfile
- `dofigen update` bumped `debian:bullseye-slim` base image digest.
- `dofigen gen` regenerated the Dockerfile to match.
- `pnpm i` refreshed lockfile for the new
`@openstatus/theme-store` workspace dep on services / server.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(services): migrate monitor domain tRPC onto service layer
Fourth domain, the largest so far. Migrates the **tRPC** monitor router
(all 14 procedures) onto `@openstatus/services/monitor`. The Connect
handler (`apps/server/src/routes/rpc/services/monitor/`) and the v1 REST
`apps/server/src/routes/v1/monitors/*` endpoints stay untouched for now
— they are separate external-API surfaces and warrant their own
follow-up PRs with their own test suites.
## Services (`packages/services/src/monitor/`)
- `createMonitor`, `cloneMonitor`, `deleteMonitor`, `deleteMonitors`
(bulk soft-delete), `getMonitor`, `listMonitors`.
- `updateMonitorGeneral` — preserves the existing tRPC `updateGeneral`
behaviour including jobType switching (HTTP ↔ TCP ↔ DNS). Called out
as a code smell in the explore, but preserved intentionally since
it's the dashboard's current edit flow. Kept jobType-agnostic rather
than split into 3 separate update methods.
- `updateMonitorRetry`, `updateMonitorFollowRedirects`,
`updateMonitorOtel`, `updateMonitorPublic`,
`updateMonitorResponseTime` — field-specific setters.
- `updateMonitorSchedulingRegions` — enforces plan limits
(periodicity / region-count / region-access) + validates the
private-location ids before replacing the association set.
- `updateMonitorTags`, `updateMonitorNotifiers` — validate and
replace the tag / notifier association sets.
- `bulkUpdateMonitors` — batched toggle of `public` / `active`
across multiple monitor ids. Matches the old `updateMonitors`
procedure.
- All mutations run inside `withTransaction`, emit `emitAudit`.
- Cascade deletes: `monitor_tag_to_monitor`,
`notifications_to_monitors`, `page_component` rows are torn down on
delete (matches pre-migration behaviour — bypasses FK cascades
because some rows reference the monitor without cascade).
## Shared helpers (`monitor/internal.ts`)
- `validateTagIds` / `validateNotificationIds` / `validatePrivateLocationIds`
— workspace-scoped validators with dedupe.
- `pickDefaultRegions(workspace)` — ports the old "randomly pick 4
free / 6 paid regions excluding deprecated" logic.
- `serialiseAssertions` / `headersToDbJson` — assertion / header
serialisation moved out of the tRPC router.
- `countMonitorsInWorkspace` for quota checks.
## list.ts — batched enrichment
`listMonitors` does two IN queries (tags + incidents) regardless of
list size. `getMonitor` reuses the same path with the richer
`{ notifications, privateLocations }` toggle and a singleton. Same
pattern as status-report's batched fix.
## Surfaces migrated
- **tRPC** (`packages/api/src/router/monitor.ts`): every procedure is a
thin wrapper. `delete` catches `NotFoundError` for idempotency.
`new` / `updateGeneral` keep the `testHttp` / `testTcp` / `testDns`
pre-save check at the tRPC layer — services unconditionally save,
callers decide whether to pre-validate.
## Out of scope (follow-up PRs)
- Connect RPC handler at `apps/server/src/routes/rpc/services/monitor/`
— still uses its own helpers. Will be migrated in a follow-up
(4b).
- v1 REST endpoints at `apps/server/src/routes/v1/monitors/*` — yet
another external surface; dedicated PR later.
- `packages/api/src/service/import.ts` monitor writes — covered by
the plan's dedicated PR 9.
## Enforcement
- Biome `noRestrictedImports` scope adds
`packages/api/src/router/monitor.ts`. The Connect monitor handler
stays out of scope until 4b.
- Subpath export `@openstatus/services/monitor`.
## Tests
- `packages/services/src/monitor/__tests__/monitor.test.ts` —
create (http / tcp / dns), delete cascade, bulk delete, clone,
tags / notifiers validation (forbidden when cross-workspace),
list / get with workspace isolation + soft-delete hiding,
updateMonitorGeneral round-trip.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(services): declare @openstatus/regions + @openstatus/assertions deps
The monitor domain pulls `regionDict` from `@openstatus/regions` (for
the default-region picker) and assertion classes / validators from
`@openstatus/assertions`. Both were missing from
`packages/services/package.json`, so Vercel's Next.js build for
`apps/status-page` — which transitively imports the services package
via `@openstatus/api` — failed with "Module not found" on
`packages/services/src/monitor/internal.ts:20` and `schemas.ts:1`.
Local pnpm workspaces resolved the imports via hoisting, which masked
the missing declarations until an actual Next.js build forced strict
resolution.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(services/monitor): address Cubic review
- **`timeout` / `degradedAfter` bounds** (schemas.ts) — mirror the
0–60_000 ms cap from `insertMonitorSchema`. Values outside the range
are rejected before the UPDATE instead of being silently persisted.
- **`jsonBody` assertion mapping** (internal.ts) — the serialiser was
silently dropping `jsonBody`-typed input because the runtime class
wasn't wired up. Added the `JsonBodyAssertion` branch; the class has
existed in `@openstatus/assertions` the whole time, this was just a
missing case.
- **Clone resets `status`** (clone.ts) — cloning no longer inherits the
source's current `error`/`degraded` health. Freshly cloned monitors
start at `"active"` and settle on their first check.
- **Delete filters soft-deleted** (delete.ts) — the pre-check now
includes `isNull(monitor.deletedAt)`, so a repeat delete returns
`NotFoundError` (preserved as idempotent at the tRPC layer) instead
of re-running the cascades and emitting duplicate audits.
- **List/get workspace scope on relations** (list.ts) — `enrichMonitorsBatch`
takes `workspaceId` and scopes the incident / tag / notification /
private-location IN queries to the caller's workspace. Defence-in-depth
against inconsistent FK data (none of those tables enforce workspace
ownership at the FK level).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(services): migrate notification CRUD (PR 5/N) (#2106)
* feat(services): migrate notification CRUD onto service layer
Fifth domain. Migrates the four CRUD procedures — `list`, `new`,
`updateNotifier`, `delete` — of the notification tRPC router onto
`@openstatus/services/notification`. The three integration-helper
procedures (`sendTest`, `createTelegramToken`, `getTelegramUpdates`)
stay inline at the tRPC layer and are explicitly out of scope for this
migration.
## Services (`packages/services/src/notification/`)
- `createNotification` — enforces plan limits on both the channel count
(`notification-channels`) and the provider itself (`sms` / `pagerduty`
/ `opsgenie` / `grafana-oncall` / `whatsapp` require plan flags);
validates the loose `data` payload against `NotificationDataSchema`;
validates monitor ids are in-workspace and not soft-deleted.
- `updateNotification` — replaces name / data / monitor associations in
a single transaction with the same validation rules as create.
- `deleteNotification` — hard-delete (FK cascade clears associations).
The tRPC wrapper swallows `NotFoundError` to preserve the old
idempotent behaviour.
- `listNotifications`, `getNotification` — batched IN query enriches
monitors per notification. Monitor enrichment is workspace-scoped and
filters soft-deleted monitors for defence-in-depth.
- All mutations run inside `withTransaction`, emit `emitAudit`.
## Surface migrated
- **tRPC** (`packages/api/src/router/notification.ts`): `list` / `new` /
`updateNotifier` / `delete` become thin service wrappers. `sendTest`
+ `createTelegramToken` + `getTelegramUpdates` are unchanged.
## Out of scope (flagged for follow-up)
- **`sendTest` migration** — the dispatch switch imports from 10
`@openstatus/notification-*` packages. Moving it into services would
pull those as direct deps; the plan's phrasing ("Channel CRUD +
test-dispatch") allows this as a later extraction.
- **`createTelegramToken` / `getTelegramUpdates`** — redis + external
Telegram API helpers; transport UX, not domain operations.
- **Biome scope for `notification.ts`** — the file still imports
`@openstatus/db/src/schema` for the `sendTest` provider data schemas.
Will land with the sendTest migration follow-up.
- **Connect RPC notification handler** — stays on its own helpers;
follow-up aligned with PR 4's Connect deferral.
## Tests
- `__tests__/notification.test.ts` covers create (including
`ValidationError` on malformed data, `LimitExceededError` on gated
provider, `ForbiddenError` on cross-workspace monitor), update
(association replacement, cross-workspace `NotFoundError`), delete,
list/get workspace isolation + monitor enrichment scope.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(services/notification): address Cubic review
- **Update flow plan gate** (update.ts) — `updateNotification` was
skipping `assertProviderAllowed`, so a user who downgraded their plan
could still edit a notification configured with a now-restricted
provider. Re-check against the stored `existing.provider` to match
the create-time gate.
- **Provider / data match** (internal.ts) — `NotificationDataSchema` is
a union, so `{ provider: "discord", data: { slack: "…" } }` passed
the union check even though the payload key doesn't match the
provider. `validateNotificationData` now takes the provider and
asserts `provider in data` after the top-level parse. Applied in
both `create` and `update` (update uses the stored provider since
the API doesn't allow provider changes).
Added a test for the mismatch case.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(services/notification): tighten data validation against provider schema
Cubic's follow-up on the previous fix was right: checking only
`provider in data` isn't enough. `NotificationDataSchema` is a union,
so a payload like `{ discord: "not-a-url", slack: "valid-url" }` passes
because the union matches the slack variant — the extra `discord` key
is ignored, and my key-presence check sees `"discord"` and lets it
through.
Replaced the union parse + key check with a provider-specific schema
lookup (`providerDataSchemas[provider].safeParse(data)`). Each
canonical channel schema is keyed by its provider name and validates
the shape / content of the value, so the new check catches both the
mismatched-provider and malformed-payload cases in one pass.
Added a test covering the exact case Cubic flagged — invalid `discord`
URL alongside a valid `slack` URL now rejects with ValidationError.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(services): migrate page-component domain (PR 6/N) (#2107)
* feat(services): migrate page-component domain onto service layer
Sixth domain, tRPC-only. Migrates `list`, `delete`, and `updateOrder`
onto `@openstatus/services/page-component`.
## Services (`packages/services/src/page-component/`)
- `listPageComponents` — workspace-scoped filter + optional pageId
filter. Batched enrichment in four IN queries (monitors, groups,
status reports via join, maintenances via join). All relation queries
scoped to the caller's workspace for defence-in-depth.
- `deletePageComponent` — hard-delete. Cascade clears the
`status_report_to_page_component` / `maintenance_to_page_component`
associations. The tRPC wrapper swallows `NotFoundError` to preserve
the pre-migration idempotent behaviour.
- `updatePageComponentOrder` — the complex one. Mirrors the existing
diff-and-reconcile pass faithfully (≈220 lines → a single transaction):
1. Assert the page is in the workspace.
2. Enforce the workspace's `page-components` plan cap.
3. Validate every monitor id in the input set.
4. Remove monitor components whose monitorId isn't in the input;
remove static components based on whether the input carries ids.
5. Clear `groupId` before dropping groups (FK safety), then recreate
groups.
6. Upsert monitor components via `onConflictDoUpdate` on the
`(pageId, monitorId)` unique constraint (preserves ids).
7. Update existing static components by id; insert new ones.
Audit: `page_component.update_order` / `page_component.delete`.
## Surface
- **tRPC** (`packages/api/src/router/pageComponent.ts`): all three
procedures call services. `delete` catches `NotFoundError` and
returns the old `drizzle.returning()`-shaped empty array. The
pre-existing `pageComponent.test.ts` (tests cross-workspace monitorId
→ `TRPCError(FORBIDDEN)`) is untouched and still valid — my services
throw `ForbiddenError`, which `toTRPCError` maps to the same code.
## Enforcement
- Biome `noRestrictedImports` scope adds
`packages/api/src/router/pageComponent.ts`.
- Subpath export `@openstatus/services/page-component`.
## Tests
- `__tests__/page-component.test.ts` covers `updatePageComponentOrder`
happy path (creates monitor + static + grouped components), rejects
cross-workspace monitorId and cross-workspace pageId, `list`
workspace isolation, `delete` cross-workspace `NotFoundError`.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(services): Connect RPC notification handler catch-up (#2108)
* feat(services): Connect RPC notification handler onto services (catch-up)
Follow-up to PR 5 — noticed on review that my PRs from PR 4 onwards had
been narrowing scope to tRPC only and deferring Connect handlers, which
was piling up. This closes the notification Connect gap.
## What changed
`apps/server/src/routes/rpc/services/notification/index.ts` — the five
CRUD methods now delegate to `@openstatus/services/notification`:
- `createNotification` → `createNotification` service (handles the
plan-count limit, per-provider plan gate, and data-schema validation
internally — the Connect-side `checkNotificationLimit` /
`checkProviderAllowed` / `validateProviderDataConsistency` calls are
gone).
- `getNotification`, `listNotifications`, `updateNotification`,
`deleteNotification` — thin proto-to-service-to-proto wrappers.
- `updateNotification` reads the existing record via the service and
fills in missing fields (Connect's update is partial; the service
expects a full payload), then applies the update.
Left inline:
- `sendTestNotification` — calls `test-providers.ts` (external HTTP).
- `checkNotificationLimit` RPC method — returns the count info via
`./limits.ts` helpers (pure queries, no domain mutation).
The local Connect helpers (`validateProviderDataConsistency`,
`checkNotificationLimit`, `checkProviderAllowed`, and the ad-hoc
`validateMonitorIds` / `updateMonitorAssociations` / `getMonitorById` /
`getMonitorCountForNotification` / `getMonitorIdsForNotification`) are
no longer imported by `index.ts`; they remain in their files because
`test-providers.ts` and the unmigrated Connect monitor handler still
reference some of them.
## Biome
Added `apps/server/src/routes/rpc/services/notification/index.ts` to
the `noRestrictedImports` scope. The directory-level glob isn't a fit
because `limits.ts` and `test-providers.ts` legitimately need direct
db access until their own follow-up migrations.
## Deferred
- **Connect monitor handler** (~880 lines, 6 jobType-specific
create/update methods + 3 external-integration methods) — requires a
much bigger refactor. Flagged as dedicated PR 4b; tracked separately.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(services): dedupe monitor ids in Connect createNotification response
Cubic's P2 catch: the service dedupes `monitors` before the insert
(via `validateMonitorIds` in the services package), but the Connect
handler echoed `req.monitorIds` verbatim back in the response. For an
input like `["1", "1", "2"]` the DB stored `[1, 2]` while the response
claimed `["1", "1", "2"]` — caller state diverges from persistence.
Echo `Array.from(new Set(req.monitorIds))` instead so the response
matches what's actually stored.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(services): migrate page authoring (PR 7/N) (#2109)
* feat(services): migrate page (status-page authoring) onto service layer
Seventh domain. Migrates the 13 authoring procedures in
`pageRouter` onto `@openstatus/services/page`. Deliberately scoped to
authoring CRUD only:
- `statusPage.ts` — public viewer endpoints (subscribe / get / uptime /
report / verify / unsubscribe) are a separate surface that doesn't
use the authenticated `ServiceContext`; dedicated follow-up.
- Connect `apps/server/src/routes/rpc/services/status-page/**` — ~1500
lines with 18 methods (page CRUD + components + groups + subscribers
+ view). Too big for this PR; dedicated follow-up, same shape as the
Connect monitor deferral.
## Services (`packages/services/src/page/`)
- `createPage` / `newPage` — full vs minimal create; both enforce the
`status-pages` plan cap and (for `createPage`) the per-access-type
plan gates (password-protection, email-domain-protection, ip-
restriction, no-index).
- `deletePage` — FK cascade clears components / groups / reports /
subscribers.
- `listPages` — batched enrichment with `statusReports`.
- `getPage` — enriched with `maintenances` / `pageComponents` /
`pageComponentGroups`.
- `getSlugAvailable` — pure check against `subdomainSafeList` + DB.
- `updatePageGeneral` — with slug-uniqueness re-check on change.
- `updatePageCustomDomain` — persists the DB change and returns the
previous domain so the caller can diff. Vercel add/remove stays at
the tRPC layer (external integration).
- `updatePagePasswordProtection` — re-applies the same plan gates
the `create` path uses.
- `updatePageAppearance`, `updatePageLinks`, `updatePageLocales`
(gated on `i18n` plan flag), `updatePageConfiguration`.
- Audit action emitted for every mutation.
## tRPC (`packages/api/src/router/page.ts`)
All 13 procedures are thin wrappers. `delete` catches `NotFoundError`
for idempotency. `updateCustomDomain` orchestrates:
1. `getPage` (via service) to read the existing domain.
2. `addDomainToVercel` / `removeDomainFromVercel` as needed.
3. `updatePageCustomDomain` (via service) to persist.
## Enforcement
- Biome scope adds `packages/api/src/router/page.ts`. The router
imports `insertPageSchema` via the services re-export
(`CreatePageInput`) so the db-import ban applies cleanly.
- Subpath export `@openstatus/services/page`.
## Tests
- `__tests__/page.test.ts` covers `newPage` happy / reserved /
duplicate, `createPage` monitor attachment + cross-workspace monitor,
`updatePageGeneral` rename + duplicate-slug conflict + cross-workspace,
`updatePageLocales` plan gate, list / get / slug-available workspace
isolation, delete cross-workspace NotFoundError.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(services): migrate Connect status-page page CRUD onto services
Extends PR #2109 to cover the Connect RPC status-page handler's page
CRUD surface (create / get / list / update / delete), matching the
migration that landed for tRPC's `pageRouter`. The other 13 methods
(components, groups, subscribers, viewer) still read the db directly —
they're separate domains that'll need their own services in follow-ups.
- create / get / delete call into `@openstatus/services/page` and
preserve the granular Connect errors (`statusPageNotFoundError`,
`slugAlreadyExistsError`) by pre-checking before the service call or
catching `NotFoundError` → re-throwing the richer variant.
- list fetches via the service and paginates in-memory; status-page
quota is bounded per workspace so the extra enrichment is negligible.
- update loads the existing page via the service, then orchestrates the
per-section updates (`updatePageGeneral`, `updatePageLinks`,
`updatePageAppearance`, `updatePageCustomDomain`, `updatePageLocales`,
`updatePagePasswordProtection`) inside a shared transaction so a
partial failure can't leave the page half-updated. Each service's
internal `withTransaction` detects the pre-opened tx and skips
nesting.
- Proto-specific format validations (https icon URL, custom-domain
regex, IPv4 CIDR, email-domain shape) and the i18n PermissionDenied
path stay at the handler — they don't exist in the zod insert schema
and their error codes would change if deferred to the service.
- `Page` from the service parses `authEmailDomains` / `allowedIpRanges`
into arrays, while the converters (still used by the unmigrated
methods) expect the comma-joined string form. `serviceToConverterPage`
bridges the two shapes at the call sites that need it.
Biome scope deliberately unchanged: the file still imports from
`@openstatus/db` for the 13 legacy methods, so the override would
light up the whole file.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(services/page): address Cubic review on #2109
Four issues flagged across two Cubic reviews:
- `createPage` skipped `assertSlugAvailable`, so full-form creates
could bypass reserved/duplicate slug validation and either create a
duplicate or fail late on a DB constraint instead of the clean
`ConflictError`. Added the check alongside the existing quota gate.
- `createPage` passed `passwordProtected` / `allowedIpRanges` but not
`allowIndex` to `assertAccessTypeAllowed`, bypassing the `no-index`
plan gate on create. Now forwarded.
- `UpdatePagePasswordProtectionInput.allowedIpRanges` accepted arbitrary
strings. Mirrored the CIDR validation from `insertPageSchema` — bare
IPs get `/32` appended, everything pipes through `z.cidrv4()`.
- `updatePagePasswordProtection` wrote `authEmailDomains:
input.authEmailDomains?.join(",")`, which evaluates to `undefined`
when the caller clears the field. Drizzle treats `undefined` as
"skip this column" on `.set()`, so stale email domains survived an
access-type switch. Added the `?? null` fallback to match the
neighboring `allowedIpRanges` line. This fixes the Connect
`updateStatusPage` path where switching away from AUTHENTICATED sets
`nextAuthEmailDomains = undefined` expecting the column to clear.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(services): migrate workspace / user / invitation / api-key (PR 8/N) (#2110)
* feat(services): migrate workspace / user / invitation / api-key (PR 8/N)
Stacked on PR #2109. Eighth migration — four small domains consolidated into
one PR because each is narrow (roughly two to five procedures) and they
share no structural dependencies beyond already-migrated infrastructure.
### Services (`packages/services/src/{workspace,user,invitation,api-key}/`)
**workspace** — `getWorkspace`, `getWorkspaceWithUsage` (pages + monitors +
notifications + page-components batched via drizzle relations),
`listWorkspaces` (takes `userId` explicitly since `list` runs across every
workspace the user has access to), `updateWorkspaceName`.
**user** — `getUser` (active, non-soft-deleted), `deleteAccount` (the paid-
plan guardrail stays; removes non-owned memberships, sessions, OAuth
accounts and blanks the PII columns inside a single tx).
**invitation** — `createInvitation` (plan gate counts pending invites
against the members cap so two outstanding invites can't both accept past
the limit), `deleteInvitation`, `listInvitations`, `getInvitationByToken`
(scoped by token **and** accepting email to prevent token-sharing),
`acceptInvitation` (stamps acceptedAt + inserts membership atomically).
**api-key** — `createApiKey` (returns plaintext token once), `revokeApiKey`
(workspace-scoped existence check inside the tx so concurrent revokes
resolve to a consistent NotFound rather than a silent no-op),
`listApiKeys` (replaces the legacy per-row `Promise.all` fan-out with a
single IN query for creator enrichment), `verifyApiKey` +
`updateApiKeyLastUsed` (no ctx required — the verify path runs before
workspace resolution and callers pass an optional `db` override).
### tRPC (`packages/api/src/router/{workspace,user,invitation,apiKey}.ts`)
All 14 procedures become thin `try { return await serviceFn(...) } catch
{ toTRPCError }` wrappers. Router shapes stay identical so the dashboard
needs no changes. Connect + Slack don't expose these domains today;
migrating their consumers is a follow-up.
### Enforcement
Biome `noRestrictedImports` override adds the four router files. Subpath
exports `@openstatus/services/{workspace,user,invitation,api-key}` added
to the services package.
### Cleanup
Deletes `packages/api/src/service/apiKey.ts` and its tests — fully
superseded by `packages/services/src/api-key/`. The auth middleware in
`apps/server` has its own inline apiKey verification and is unaffected.
### Deliberately out of scope
- **`domain.ts`** — pure Vercel-API proxy with no DB usage; not part of
the migration surface. Stays as-is.
- **`packages/api/src/service/{import,telegram-updates}.ts`** — import
migration is PR 9; telegram-updates stays for a follow-up.
### Tests
Per-domain `__tests__/*.test.ts` covers: workspace rename + audit, usage
counts, members cap hit on free plan, invitation token-mismatch rejection,
accept idempotency, api-key creation returning a bcrypt hash, list creator
enrichment, revoke NotFoundError on unknown ids, verifyApiKey happy / bad-
format / wrong-body paths, lastUsed debounce.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(services): address Cubic review on #2110
Four issues flagged on PR 8:
- **P1 — `invitation/accept.ts`**: the read-then-write pattern let two
concurrent accepts both pass the `isNull(acceptedAt)` check and race
through the membership insert. Replaced with a conditional UPDATE that
re-asserts `isNull(acceptedAt)` in the WHERE clause and checks
`.returning()` rowcount. The loser gets `ConflictError`, the tx aborts
before membership inserts run.
- **P2 — `api-key/create.ts`**: `createdById` was taken from input and
the router spliced in `ctx.user.id`. Since that column is attribution
data (who owns the key, who the audit row blames), trusting input
would let any caller forge ownership. Derived from `ctx.actor` via
`tryGetActorUserId`; actors without a resolvable user id (system /
webhook / unlinked api-key) now get `UnauthorizedError` instead of a
silent NULL write. `createdById` removed from the input schema.
- **P2 — `invitation/delete.ts`**: audit row was emitted even when the
DELETE matched zero rows (unknown id / wrong workspace). Switched to
`.returning({ id })` and short-circuit before the audit emit so the
log only reflects actual deletions.
- **P2 — `invitation/list.ts`**: the `if (!input.email)` →
`UnauthorizedError` branch in `getInvitationByToken` was unreachable
because `z.email()` already rejects empty / malformed emails at
`.parse()`. Removed the dead branch; the router keeps its own
pre-call check for `ctx.user.email`, so the transport-level
UnauthorizedError path is preserved.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(services): migrate import domain (PR 9/N) (#2111)
* feat(services): migrate import domain (PR 9/N)
Stacked on PR #2110. Ninth and final domain — lifts the ~1,000-line
`packages/api/src/service/import.ts` orchestrator into the services
package as its own `@openstatus/services/import` domain.
### Services (`packages/services/src/import/`)
Split into focused files:
- **`schemas.ts`** — `PreviewImportInput` / `RunImportInput` zod. Provider
discriminator + per-provider page-id fields live here; options schema
is separately exported for callers that want to pre-validate.
- **`provider.ts`** — `createProvider` factory + `buildProviderConfig`
reshape helper, isolated from the orchestrator so adding a provider is
a one-file change.
- **`limits.ts`** — `addLimitWarnings` (shared by preview + run). Pure
mutation on the `ImportSummary` argument; no writes.
- **`utils.ts`** — `clampPeriodicity` + `computePhaseStatus` helpers.
- **`phase-writers.ts`** — the seven phase writers (page / component
groups / components / incidents / maintenances / monitors /
subscribers). Each takes a `DB` explicitly so callers can thread a
pre-opened tx; failing resources get `status: "failed"` with an error
string rather than throwing.
- **`preview.ts`** — dry-run only; validates credentials, runs the
provider with `dryRun: true`, emits warnings.
- **`run.ts`** — the orchestrator. Now owns the `pageId` ownership
check (previously duplicated in the tRPC router) and emits exactly
**one** `import.run` audit row regardless of outcome so partial /
failed runs still show up in the audit signal. Deliberately *not*
wrapped in `withTransaction` — imports can span minutes across dozens
of writes and the existing UX is phase-level recovery.
### tRPC (`packages/api/src/router/import.ts`)
124 lines → 28 lines. The router is now a thin `previewImport` /
`runImport` wrapper; the input schemas and all validation live in the
service. The router-level `TRPCError`-throwing `pageId` ownership check
moved into `runImport` so non-tRPC callers (Slack / future) get the
same guard.
### Error shape changes
- Provider validation failure: `TRPCError("BAD_REQUEST")` →
`ValidationError` → `TRPCError("BAD_REQUEST")`. Net-same.
- Unknown / wrong-workspace `pageId`: `TRPCError("NOT_FOUND")` →
`NotFoundError` → `TRPCError("NOT_FOUND")`. Net-same.
### Tests
- Unit tests for `addLimitWarnings` / `clampPeriodicity` /
`computePhaseStatus` move to `packages/services/src/import/__tests__/`.
- Router integration tests (`packages/api/src/router/import.test.ts`)
that previously called `previewImport` / `runImport` directly to
override workspace limits now route through `makeCaller(limitsOverride)`
with an explicit `provider: "statuspage"` field. This also fixes four
pre-existing TypeScript errors where those calls were missing the
(required) provider discriminator.
### Enforcement
- Biome `noRestrictedImports` override adds `packages/api/src/router/import.ts`.
- Subpath export `@openstatus/services/import` added.
- `@openstatus/importers` added to services deps; services `tsconfig.json`
bumped to `moduleResolution: "bundler"` so the importers package-exports
map resolves (same setting `packages/api` already uses).
### Cleanup
Deletes `packages/api/src/service/import.ts` (1042 lines) and its test
file (463 lines). Only `telegram-updates.ts` remains in
`packages/api/src/service/` — that's slated for a follow-up PR.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(services/import): per-resource audit + Cubic fixes on #2111
Two changes folded together:
### Per-resource audit
Every phase writer now emits one `emitAudit` row per *created*
resource, matching what the domain services emit for normal CRUD:
| Phase | Audit action
| --- | ---
| page | `page.create`
| componentGroups | `page_component_group.create`
| components | `page_component.create`
| monitors | `monitor.create`
| incidents | `status_report.create` + `status_report.add_update` per update
| maintenances | `maintenance.create`
| subscribers | `page_subscriber.create`
Skipped resources don't emit (their original create audit already
exists); failed resources don't emit (nothing was written); link-table
rows (statusReportsToPageComponents etc.) don't emit (edges, not
entities). Metadata always carries `source: "import"` + `provider:
<name>` + `sourceId: <provider-id>` so the audit trail traces back to
the source system.
The rollup `import.run` audit still fires at the end — the per-resource
rows give forensic granularity, the run-level row gives "this bulk
operation happened" without scanning the full summary blob.
For the change, phase writers now take a shared `PhaseContext = { ctx,
tx, provider }` instead of `(db, workspaceId, limits)` — the orchestrator
builds one `PhaseContext` per run and threads it through, giving each
writer access to `ctx.actor` for audit attribution. `statusReportUpdate`
writes now use `.returning({ id })` so the per-update audit can
attribute the right row.
### Cubic review fixes
- **`run.ts:130`** — phases after `page` kept their provider-assigned
status when `targetPageId` was falsy but the user option wasn't
`false`. Replaced the narrow `else if (option === false)` branches
with a plain `else → phase.status = "skipped"`, matching what
`subscribers` already did.
- **`run.ts:147`** — when the `components` phase hit `remaining <= 0`,
the phase was marked `"failed"` but individual resource statuses were
left stale with no error string. Each resource is now marked
`"skipped"` with `"Skipped: component limit reached (N)"`, matching
`writeMonitorsPhase`. Phase-level status becomes `"skipped"` too
(was `"failed"` — failed implied a writer error, this is really a
plan-limit pre-check).
- **`provider.ts`** — both `createProvider` and `buildProviderConfig`
had a `default:` that silently ran the Statuspage adapter for any
unknown provider name, which would mask a typo by handing a non-
Statuspage api key to the wrong adapter. Replaced with exhaustive
`case "statuspage"` + `never`-typed default throw.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* chore(services): rename rpc/services → rpc/handlers (PR 10/N) (#2112)
The symbolic deliverable from the plan's "close the loop" PR. Renames
`apps/server/src/routes/rpc/services/` → `apps/server/src/routes/rpc/handlers/`
so the distinction between "the services layer" (owns business logic,
lives in `packages/services`) and "Connect transport handlers" (thin
proto → service → proto wrappers) is permanent and visible in the path.
Keeping the old name invites the next developer to "just add one small
thing" to a file under a `services/` folder months later; the rename
makes the layering explicit.
### Changes
- `git mv` of the six domain subdirectories + their tests
(health / maintenance / monitor / notification / status-page /
status-report).
- `router.ts` import paths updated from `./services/*` to `./handlers/*`.
- Biome `overrides.include` paths updated to the new location.
- Added `apps/server/src/routes/rpc/handlers/health/**` to the scope —
the health handler has no db usage today; including it locks in that
invariant.
### Still out of scope (follow-ups)
Rather than pretending the full "close the loop" deliverable is possible
today, the biome.jsonc comment now enumerates exactly what remains
unmigrated:
- `packages/api/src/router/statusPage.ts` — public viewer endpoints
under `publicProcedure`, no authed `ServiceContext`.
- `packages/api/src/router/{member,integration,monitorTag,
pageSubscriber,privateLocation,checker,feedback,stripe,tinybird,
email}.ts` — small domains not yet lifted.
- `apps/server/src/routes/rpc/handlers/monitor/**` — 6 jobType-specific
methods still on db.
- `apps/server/src/routes/rpc/handlers/status-page/**` — page CRUD is
migrated (PR 7), but components / groups / subscribers / viewer (13
methods) still import db, so the whole file stays out of scope.
- `apps/server/src/routes/v1/**` — the public HTTP API surface.
- `apps/server/src/routes/slack/**` except `interactions.ts` — tools,
handler, oauth, workspace-resolver still on db.
- `apps/server/src/routes/public/**` — public-facing HTTP routes.
Each of the above is its own PR-sized migration. The final consolidation
(broadening to `router/**` + dropping `@openstatus/db` from
`packages/api` and `apps/server`) is conditional on all of them
landing first.
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(services/import): use ctx workspaceId for page insert
`writePagePhase` was inserting with `data.workspaceId` — the value the
provider package round-tripped into resource data. Every other phase
writer (monitor / components / subscriber) already reads `workspaceId`
from `ctx.workspace.id`; this lines the page insert up with that
pattern. Defends against the (unlikely) case where a provider mapper
serialises the wrong workspace id into its output, since `ctx` is the
authoritative source.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(services): address Claude review findings on #2110
Six findings from Claude's review pass — five code/doc fixes, one
documentation-only note.
**P2 — `acceptInvitation` derives userId from `ctx.actor`.**
Was taking it from input: the email scoped *which* invitation could
be accepted, but not *who* the membership was inserted for. A caller
with the right token+email could insert a membership under an
arbitrary user id. Removed `userId` from `AcceptInvitationInput`;
derived from `tryGetActorUserId(ctx.actor)`, throws
`UnauthorizedError` for non-user actors. Mirrors the same pattern
applied to `createApiKey.createdById` in the Cubic pass. Router and
test updated accordingly.
**P2 — `getWorkspace` throws `NotFoundError` explicitly.**
`findFirst` + `selectWorkspaceSchema.parse(undefined)` was throwing
`ZodError` (→ `BAD_REQUEST`) instead of the `NotFoundError` shape
every other service uses. Unreachable in practice (ctx.workspace is
resolved upstream) but the error shape was the only outlier;
consistency matters for callers pattern-matching on error codes.
**P3 — `listApiKeys` filters null `createdById` before the IN query.**
The new `createApiKey` path enforces a non-null creator, but legacy
rows may have null. SQL's `x IN (NULL)` is `UNKNOWN` — technically
safe — but drizzle types model the array as `number[]`. Filtering
upfront keeps the types honest and sidesteps any future surprise.
**P3 — `deleteInvitation` guards `acceptedAt IS NULL`.**
The WHERE previously allowed hard-deleting *accepted* invitations,
wiping the "user was invited on X" breadcrumb. Added the
`isNull(acceptedAt)` guard + doc comment explaining the audit-trail
preservation intent.
**Doc-only — `deleteAccount` orphan comment.**
Non-owner memberships are removed, but owner memberships + owned
workspaces survive. Matches legacy behavior. Added a scope-note
docblock flagging that workspace cleanup is explicitly out of scope
(belongs to a future admin / scheduled job).
**Doc-only — `createInvitation` role comment.**
The invite insert lets `role` fall through to the schema default
(`member`). Matches legacy (which also only picked `email`).
Comment added so the absence reads as deliberate rather than
overlooked.
Minor — the concurrent-accept race test is covered by the conditional
UPDATE + `ConflictError` path from the earlier P1 fix; mocking it
reliably against SQLite is noisy and not worth the test complexity.
Documented in the related code comment.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(services): address Claude re-review findings on #2110
Four issues surfaced after the first round of fixes on this PR:
**P2 — `listApiKeys` crashes on all-legacy keys.**
After the null filter added in the previous commit, workspaces whose
keys all pre-date the services migration (every `createdById` null)
end up with `creatorIds === []`. Drizzle throws "At least one value
must be provided" on an empty `inArray`, taking the whole endpoint
down. Added an early return that maps `createdBy: undefined` when
there are no non-null creator ids to look up.
**P2 — `getWorkspaceWithUsage` ZodError on missing row.**
Same `findFirst` + `selectWorkspaceSchema.parse(result)` pattern as
`getWorkspace`, but without the `NotFoundError` guard that got added
in the earlier pass. Added the guard. Also cleaned up the usage
block — no longer needs optional chaining once the narrowing fires.
**P2 — `deleteAccount` took `userId` from input.**
Completing the `createApiKey` / `acceptInvitation` pattern: account
deletion must target `ctx.actor`, never an arbitrary id. Dropped
`userId` from `DeleteAccountInput` (now an empty forward-compat
shape), derived inside the service via `tryGetActorUserId`, throws
`UnauthorizedError` for non-user actors. Router updated to stop
passing it.
**P3 — `createInvitation` dev-token log could leak in tests.**
Tightened the comment around the `process.env.NODE_ENV === "development"`
guard to flag that strict equality is load-bearing — bun:test sets
`NODE_ENV=test` and CI leaves it undefined, both of which correctly
skip the log. No behavior change, just a clearer contract so the
next reader doesn't loosen it.
Cubic's two findings on this review pass point at `packages/api/src/
router/import.ts` and `packages/services/src/import/limits.ts` — both
live in the next PR up the stack (#2111 / feat/services-import) and
will be addressed there.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(services): address latest Cubic pass on #2110
Four findings from the third Cubic review (now that #2111's import
domain is included in the #2110 diff via the stack):
**P2 — biome.jsonc notification handler scope.**
Only `notification/index.ts` was in the `noRestrictedImports` override.
Sibling files (`errors.ts`, `test-providers.ts`) were outside the
migration guard, so new db imports could land in them without the
lint failing. Broadened to `notification/**` and moved the two files
that *legitimately* still read db (`limits.ts` querying workspace
quotas, `converters.ts` needing db enum shapes for proto round-trip)
into the `ignore` list. Future siblings are enforced by default
rather than silently slipping through.
**P2 — `clampPeriodicity` unknown values returned too fast.**
`PERIODICITY_ORDER.indexOf("unknown") === -1` → `Math.max(-1, 0) === 0`
→ walk started at `"30s"` (the fastest tier). Could return an
interval faster than requested, violating the
"never-faster-than-requested" invariant. Short-circuits now to the
slowest allowed tier when the requested value isn't a known
periodicity. Added unit tests covering the unknown-value + empty-
allowed fallback paths.
**P2 — component/monitor limit warnings counted total resources, not
quota-consuming inserts.**
If the import contained 4 components and 3 already existed (would be
skipped as duplicates), the warning claimed `"Only X of 4 can be
imported"` — but actually zero quota would be consumed by the 3
skips, so the real new-creation count might fit entirely. Reworded
to `"Only N new components may be created … some of the M in the
import may already exist and be skipped"`. Same treatment for the
monitors warning. Preview stays DB-light (no per-resource existence
checks); the warning now honestly conveys worst-case without
misleading users about what will actually happen. Test assertions
updated to match the new wording with substring matches that aren't
tied to the exact fraction.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(services/page): address Claude review on #2109
Six items from Claude's review, going with the calls I leaned toward
in the question-back:
**P2 — tRPC `updateCustomDomain` wasteful `getPage` read.**
Was calling `getPage(id)` (fires 3 batched relation queries:
maintenances + components + groups) just to grab `customDomain`
before the Vercel add/remove calls. Added a narrow
`getPageCustomDomain` service helper — single indexed lookup,
workspace-scoped, returns the string directly. Router swapped over.
Service-layer authority preserved; no db reads leak into the router.
**P2 — Connect `updateStatusPage` slug-race code drift.**
Handler pre-checks slug to surface `slugAlreadyExistsError`
(`Code.AlreadyExists`). The `updatePageGeneral` service call
re-validates via `assertSlugAvailable` → `ConflictError` →
`Code.InvalidArgument` in the race where two callers both clear the
pre-check. Wrapped the call in `try/catch (ConflictError)` and
rethrow as `slugAlreadyExistsError(req.slug)` so gRPC clients keying
on the code get a consistent `AlreadyExists` whether they lose at
the pre-check or at the inner tx.
**P2 — Connect `createStatusPage` / `updateStatusPage` customDomain
without Vercel sync.** Pre-existing behaviour (the direct-db handler
had the same gap). Added a top-of-impl comment so it doesn't go
unnoticed — the fix is a shared transport-layer helper the Connect
handlers can reuse, out of scope for this migration PR to keep the
behavioural blast radius small for external API consumers.
**P3 — double cast `row as unknown as Page` in `create.ts`.**
The drizzle insert-returning type and the `Page` type diverge on
`authEmailDomains` / `allowedIpRanges` (raw comma-joined string vs
parsed `string[]`). Replaced the double casts with
`selectPageSchema.parse(row)` which normalises the row into the shape
callers expect. Cast-drift is now impossible to introduce silently.
**P3 — `void ConflictError;` workaround.**
Import was unused in `create.ts`; the `void` line was silencing the
unused-import warning rather than fixing the cause. Removed both.
**P3 — deprecated `passwordProtected` column.**
Added a doc block on `updatePagePasswordProtection` flagging that
the deprecated boolean column is intentionally not written here (the
v1 REST read path derives it from `accessType` via
`normalizePasswordProtected`). Prevents a future reader from
mistaking the omission for an oversight and writing two sources of
truth for the same signal.
Test coverage for the 5 untested update services
(`updatePagePasswordProtection`, `updatePageCustomDomain`,
`updatePageAppearance`, `updatePageLinks`, `updatePageConfiguration`)
deferred to a follow-up per Claude's "not blocking" marker — the
failing-edge behaviour is the critical bit, and
`updatePagePasswordProtection` already has indirect coverage through
the Connect handler tests on this branch.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(services): address Claude review on #2108
Four items from Claude's review of the Connect notification handler
backfill:
**P3 — `protoDataToServiceInput` swallowed parse failures.**
`try { JSON.parse } catch { return {} }` was hiding any malformed
output from `protoDataToDb` (which would be a programmer error, not
user-input) behind a generic empty-object fallback. The downstream
`validateNotificationData` then failed with a far less specific
error. Let the throw propagate — `toConnectError` maps it to
`Code.Internal`, which is the signal we want for "the helper itself
misbehaved."
**P3 — `createNotification` response approximated the monitor IDs.**
Was echoing `Array.from(new Set(req.monitorIds))` on the happy path
(correct, since the service validates + throws on invalid) but the
approximation diverged from `updateNotification`'s re-fetch pattern.
Now re-fetches via `getNotification` after create so the response
reflects what's actually in the DB — one extra IN query per create,
eliminates the approximation entirely, makes both handlers
structurally identical.
**P3 — `sendTestNotification` bypassed `toConnectError`.**
Only handler in the impl without a `try { … } catch { toConnectError
}` wrap, so any thrown `ServiceError` / `ZodError` from
`test-providers.ts` fell through to the interceptor's generic catch
and surfaced with a less precise gRPC status. Wrapped for symmetry.
**P3 — `JSON.parse(existing.data)` null-unsafe.**
Drizzle infers `notification.data` as `string | null` (the column has
`default("{}")` but no `.notNull()`). A legacy row with `NULL` in the
column would crash `updateNotification` with `SyntaxError` during
the partial-update read-modify-write. Added `?? "{}"` fallback and a
comment pointing at the schema.
Cubic's single finding from the earlier pass (dedupe of
`req.monitorIds` in the create response) was already applied in
`b69ad13` and has now been superseded by the re-fetch above.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(services): address latest Cubic pass on #2108
Five findings from Cubic's second review cycle on this PR, all on
files that entered this branch via the #2109 (status-page) and #2111
(import) squash-merges stacked on top. Fixing here so the cumulative
state reaching main is clean.
**P1 — `page/create.ts` double-encoded JSON configuration.**
`page.configuration` is a drizzle `text("…", { mode: "json" })`
column — drizzle serialises objects automatically. Calling
`JSON.stringify(configuration)` first stored a raw JSON string in
the column, breaking any downstream read that expects an object
(e.g. the appearance merge at `update.ts:185`). Dropped the wrap;
drizzle handles it.
**P2 — `page/schemas.ts` slug + customDomain validation weaker than
insert schema.**
`NewPageInput.slug`, `GetSlugAvailableInput.slug`, and
`UpdatePageGeneralInput.slug` were `z.string().toLowerCase()` — no
regex, no min-length. `UpdatePageCustomDomainInput.customDomain`
was `z.string().toLowerCase()` — no format check. Meant the service
would accept malformed slugs / URLs that `createPage` would then
reject via `insertPageSchema`, or — worse — that `getSlugAvailable`
would confidently return "available" for garbage. Exported the
canonical `slugSchema` + `customDomainSchema` from
`@openstatus/db/src/schema/pages/validation` and reused them across
all four service inputs; db validation is now the single source of
truth for page slug/domain shape.
**P2 — `api/router/import.ts` nullish → optional contract narrowing.**
The service's `PreviewImportInput`/`RunImportInput` used `.optional()`
for the three provider page-id fields, which dropped the `null`
acceptance the legacy router had via `.nullish()`. Existing clients
sending `null` would have started hitting `Invalid input` errors
after the import migration landed. Added a `nullishString` transform
in the service schema that accepts `string | null | undefined` and
normalises to `string | undefined` before it reaches
`buildProviderConfig` — callers keep the broader contract, service
internals stay ignorant of `null`.
**P2 — `page/update.ts` empty array stored "" not null.**
`authEmailDomains?.join(",") ?? null` coerces `null`/`undefined` to
`null`, but `[].join(",")` returns `""` (empty string) which `??`
treats as a value. Callers sending `authEmailDomains: []` to clear
the column were persisting the empty string instead of nulling it —
misleading "present but blank" state. Switched to `|| null` on both
array-join outputs (`authEmailDomains` + `allowedIpRanges`) so the
three clearing inputs — `undefined`, `null`, `[]` — all land on DB
`NULL` while real non-empty joins pass through unchanged.
Test fixtures already use slugs ≥ 3 chars that match the regex, so
the tightened validation doesn't break any existing assertions.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(services/page-component): address Cubic + Claude review on #2107
Three fixes + one test, addressing both the original Cubic finding
and Claude's re-review pass.
**P2 — Discriminated union for `componentInput`.**
The flat `z.object` with `type: z.enum(["monitor", "static"])` and
optional `monitorId` let callers submit a "monitor" component with
no monitor id, or a "static" one with a monitor id attached. The DB
catches it with a `CHECK` constraint, but that surfaces as an opaque
SQLite CHECK failure instead of a clean `ZodError` at the service
boundary. Replaced with a `z.discriminatedUnion("type", [...])` that
requires `monitorId` on the "monitor" arm and omits it on the
"static" arm.
Fallout in `update-order.ts`: `c.monitorId` no longer exists on the
"static" arm after narrowing, so the spreads now use
`monitorId: c.type === "monitor" ? c.monitorId : null`. The defensive
`&& c.monitorId` guards on the already-narrowed monitor branches are
gone (TypeScript enforces the invariant the DB was catching late).
**P2 — Sequential group insert instead of bulk `.returning()`.**
The bulk insert relied on drizzle/SQLite returning rows in the same
order they were inserted, so `newGroups[i]` could line up with
`input.groups[i]` when mapping components to their groups. True on
Turso today, but an implicit coupling — any driver change, batch
split, or upstream sort could silently reorder rows and land
components in the wrong group with no error signal. Switched to a
loop that captures each group id before moving on; the set size is
bounded by the status-page component-group plan cap so the extra
round trips are a rounding error.
**Nit — removed dead `hasStaticComponentsInInput` guard.**
Both the "input has static components but none carry ids" and "input
has no static components at all" branches collapsed to the same
"drop all existing static components" action, so the outer
`hasStaticComponentsInInput` conditional was doing no work. Dropped
the variable and the nested branch.
**Test — upsert idempotency.**
The `onConflictDoUpdate` on `(pageId, monitorId)` was the riskiest
untested path — a regression would silently insert duplicate rows on
every re-invocation. Added a test that calls
`updatePageComponentOrder` twice on the same page with the same
`monitorId`, then asserts there's exactly one matching row and the
second call's values won.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(services): address latest Cubic pass on #2107 + unblock build
Eight Cubic findings from the second review plus one dashboard build
break from my earlier discriminated-union change.
**Build — router shape diverged from service discriminated union.**
`packages/api/src/router/pageComponent.ts` kept its own flat
`z.object({...})` input schema with `type: z.enum(["monitor",
"static"])` and `monitorId: z.number().nullish()`. After the service
switched to `z.discriminatedUnion("type", [...])`, TS couldn't
reconcile the two — dashboard build failed. Replaced the local
schema with the service's exported `UpdatePageComponentOrderInput`
so both layers share the canonical shape.
**P1 — page router: validate customDomain before Vercel call.**
The router input was `z.string().toLowerCase()` (no format check) and
the service's `customDomainSchema` only fired inside
`updatePageCustomDomain`, *after* the Vercel add/remove mutations.
A malformed domain could be added to Vercel, then rejected by the
service, leaving Vercel/db state drifted. Switched the router input
to the service's `UpdatePageCustomDomainInput` so format validation
runs at tRPC input parsing, before any Vercel call.
**P1 — `listApiKeys` leaked `hashedToken`.**
`SELECT *` returned every column including the bcrypt hash of each
key's one-time token, which has no business appearing in a list
response. Replaced with an explicit column select that omits
`hashedToken`. New `PublicApiKey` type (`Omit<ApiKey,
"hashedToken">`) is the return shape; exported from the barrel.
**P2 — `acceptInvitation` eager workspace load + second fetch.**
The initial `findFirst` already loaded the workspace via `with: {
workspace: true }`, but the return value re-fetched it by id. Use
the joined value directly — one round-trip instead of two, and
eliminates the read-skew window where a just-renamed workspace
could appear with a different name in each fetch.
**P2 — `import.run` audit entityId 0.**
`entityId: targetPageId ?? 0` wrote a ghost `page 0` reference to
the audit trail when the page phase failed before producing an id.
Entity attribution now falls back to the workspace (`entityType:
"workspace"`, `entityId: ctx.workspace.id`) when no target page is
in play — real rollback signal, no phantom foreign key.
**P2 — `page-components` limit scoped per-page, not workspace.**
`page-components` is a workspace-wide cap (see
`page-component/update-order.ts` — counts every component across
every page). The import preview and run's component check were
scoping the existing count to `targetPageId`, which understated
pressure and would let imports push past the cap at write time.
Both sites now count workspace-wide.
**P2 — `writeIncidentsPhase` lacked idempotency.**
Every other phase writer checks for an existing row before
inserting (page by slug, monitor by url, component by name,
subscriber by email); `writeIncidentsPhase` inserted
unconditionally. A re-run would duplicate status reports on every
pass. Added an existence check by `(title, pageId, workspaceId)`
matching the convention.
**P2 — `writeMaintenancesPhase` lacked idempotency.**
Same pattern. Added a check by `(title, pageId, from, to,
workspaceId)` — the `from/to` pair is load-bearing because
maintenance titles recur ("DB upgrade") across unrelated windows.
**P2 — `writeComponentsPhase` silent monitor→static fallback.**
When the source monitor failed to resolve (e.g. `includeMonitors ===
false`), the component was silently degraded to `type = "static"`
and reported as `created` with no explanation. Other phase writers
populate `resource.error` on any degrade path. Added a matching
error string pointing at the source monitor id (or lack thereof) so
the summary conveys the degrade.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(services/page-component): revert discriminated union to flat + .refine
The previous commit switched `componentInput` in the service schema to
a `z.discriminatedUnion("type", [...])` to get a clean `ZodError` at
parse time for the monitor/static invariant. That produced a narrowed
TS shape (`type: "monitor"` → required `monitorId: number`) that every
caller had to match — including the dashboard form, where
react-hook-form can't model discriminated unions cleanly and would
have needed a flat→union adapter at submit time. The ripple was
user-visible frontend churn for a schema-layer concern.
Switched back to a flat `z.object` + cross-field `.refine` on
`(type, monitorId)`. Same parse-time rejection Cubic asked for
(ZodError with a specific path, not an opaque SQLite CHECK failure),
but the inferred TS type stays flat so callers — router input, RHF
form values — keep their existing shape.
Also restored the downstream `&& c.monitorId` guards and
`as number[]` casts in `update-order.ts`. With a flat schema, TS
still sees `monitorId: number | null | undefined` on the monitor
branch; the refine rejects violating input at parse time, but the
guard is needed to narrow for the type system. Matches the
pre-migration shape exactly.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(services/import): reconcile component links on idempotent skip
The idempotency checks in `writeIncidentsPhase` and
`writeMaintenancesPhase` added in the previous pass correctly avoid
duplicate status-report / maintenance rows on rerun, but `continue`-d
out of the writer before the component-link insertion block. The
failure mode this leaves open:
1. Run 1: component phase uses per-resource catch, so a single
component can fail and leave `componentIdMap` partial.
2. The report/maintenance is written with a subset of the intended
links — only the entries whose source id resolved in the map.
3. Run 2: the previously-failed component now succeeds and lands in
`componentIdMap`. The report/maintenance idempotency check hits,
`continue` fires, and the still-missing link is never written.
Both join tables (`statusReportsToPageComponents`,
`maintenancesToPageComponents`) have a composite primary key on
`(parentId, pageComponentId)`. Running the same link-build pass on
the skip path with `.onConflictDoNothing()` is a no-op for the links
already present and adds any that resolved this time round. Matches
the "reruns converge to correct state" model that motivated the
idempotency checks in the first place.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(services/notification): address Claude review pass
Three small findings from the latest Claude review, bundled:
**#1 — `list.ts` redundant `conditions` array.** `listNotifications`
initialised a one-element `SQL[]` and spread it into `and(...)` with
no second push site anywhere. Collapsed to a direct `eq(...)`. The
pattern is load-bearing in `monitor/list.ts` (two conditions: workspace
scope + soft-delete filter) but notifications have no soft-delete
column, so the indirection was pure noise.
**#2 — router `dataInputSchema` duplicated the service schema.**
`packages/api/src/router/notification.ts` hand-rolled a
`z.partialRecord` structurally identical to the `dataSchema` inside
`packages/services/src/notification/schemas.ts`. Drift hazard: a
future provider-value shape change in the service would be accepted
at the tRPC layer and fail with an opaque error deeper in
`validateNotificationData`. Renamed the service schema to
`NotificationDataInputSchema`, exported it from the service barrel,
and replaced the router's local copy + the now-unused
`servicesNotificationProvider` alias.
**#5 — `update.ts` audit missing `provider` metadata.**
`createNotification` attaches `metadata: { provider: input.provider
}` to its audit row; `updateNotification` didn't. The `provider`
column is recoverable from the `before`/`after` rows, but asymmetric
metadata breaks simple `action + metadata.provider` audit queries.
Added `metadata: { provider: existing.provider }` for parity.
Skipped the two non-fixes: the `enrichNotificationsBatch` drizzle-cast
fragility is the same pattern as `monitor/list.ts`, worth a codebase-
wide change rather than a single-domain carve-out; the `dataSchema`
being "intentionally wide" is already called out in the schema JSDoc
and is correct by design (provider/payload alignment is enforced at
the service boundary by `validateNotificationData`).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* test(services/notification): cover update audit + post-downgrade gate
Claude review noted two gaps in the `updateNotification` suite. Adding
both:
**`notification.update` audit row.** `createNotification` already
asserts an audit row fires with `expectAuditRow`; the update path was
silent. With the per-mutation audit contract (every write emits a
row), the update case needs an equivalent pin so a regression that
drops the emit site is caught. Note: the v1 audit buffer shape
(`AuditLogRecord`) doesn't carry `metadata`, so the `{ provider }`
payload can't be asserted directly here — that coverage lands with
the v2 audit-table move, called out in the test comment.
**Plan-gate after a downgrade.** The Cubic-flagged fix added
`assertProviderAllowed(existing.provider)` to `updateNotification`
so a previously-allowed channel becomes read-only once the workspace
drops to a plan that no longer includes it. The regression test
simulates the downgrade by directly inserting a `pagerduty` row into
the free workspace (bypasses the create-time gate) and then calls
`updateNotification` via `freeCtx` — asserts `LimitExceededError`.
Without the gate the update would silently succeed and leave a
channel on an unsupported plan.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(services): address Cubic P1/P2/P3 pass across the stack
Twelve findings from Cubic's 2026-04-24 review on #2106. Fixes are
bundled on this branch because the later-PR content has cascaded
down via the 2107 squash-merge — each item is called out by file so
the commit is easy to skim.
**P1 — `import/phase-writers.ts` subscribers reconciliation.**
`writeIncidentsPhase` and `writeMaintenancesPhase` already
reconcile component links on the idempotent-skip path.
`writeSubscribersPhase` didn't, so `pageSubscriberToPageComponent`
rows lost to a partial first run are never recovered. Same
`.onConflictDoNothing()` pattern, same composite-PK rationale.
**P1 — `page/schemas.ts:129` UpdatePageAppearance accepts any theme.**
`configuration.theme` was `z.string()`; the read path parses
configuration through `pageConfigurationSchema` which uses the
`THEME_KEYS` enum. Wrote-and-broke-read pairs are now impossible.
Router now reuses `UpdatePageAppearanceInput` from services so the
tRPC boundary enforces the same enum.
**P1 — `page/schemas.ts:160` UpdatePageConfiguration too permissive.**
Replaced the `z.record(z.string(), z.string()|z.boolean())` with
`pageConfigurationSchema.nullish()`, which is what the read path
already uses. Dashboard form's free-typed `configuration` values are
cast at the mutation call-site (tRPC input parse catches invalid
submits).
**P2 — `page-component/internal.ts:39` validateMonitorIds.**
Added `isNull(monitor.deletedAt)` so tombstoned monitor ids don't
pass as valid attach targets.
**P2 — `page-component/list.ts:71` enrichment returns deleted monitors.**
Same `isNull(monitor.deletedAt)` filter on the enrichment lookup so
`component.monitor` is never populated with a tombstoned row.
…
* feat(services): migrate monitor domain tRPC onto service layer
Fourth domain, the largest so far. Migrates the **tRPC** monitor router
(all 14 procedures) onto `@openstatus/services/monitor`. The Connect
handler (`apps/server/src/routes/rpc/services/monitor/`) and the v1 REST
`apps/server/src/routes/v1/monitors/*` endpoints stay untouched for now
— they are separate external-API surfaces and warrant their own
follow-up PRs with their own test suites.
- `createMonitor`, `cloneMonitor`, `deleteMonitor`, `deleteMonitors`
(bulk soft-delete), `getMonitor`, `listMonitors`.
- `updateMonitorGeneral` — preserves the existing tRPC `updateGeneral`
behaviour including jobType switching (HTTP ↔ TCP ↔ DNS). Called out
as a code smell in the explore, but preserved intentionally since
it's the dashboard's current edit flow. Kept jobType-agnostic rather
than split into 3 separate update methods.
- `updateMonitorRetry`, `updateMonitorFollowRedirects`,
`updateMonitorOtel`, `updateMonitorPublic`,
`updateMonitorResponseTime` — field-specific setters.
- `updateMonitorSchedulingRegions` — enforces plan limits
(periodicity / region-count / region-access) + validates the
private-location ids before replacing the association set.
- `updateMonitorTags`, `updateMonitorNotifiers` — validate and
replace the tag / notifier association sets.
- `bulkUpdateMonitors` — batched toggle of `public` / `active`
across multiple monitor ids. Matches the old `updateMonitors`
procedure.
- All mutations run inside `withTransaction`, emit `emitAudit`.
- Cascade deletes: `monitor_tag_to_monitor`,
`notifications_to_monitors`, `page_component` rows are torn down on
delete (matches pre-migration behaviour — bypasses FK cascades
because some rows reference the monitor without cascade).
- `validateTagIds` / `validateNotificationIds` / `validatePrivateLocationIds`
— workspace-scoped validators with dedupe.
- `pickDefaultRegions(workspace)` — ports the old "randomly pick 4
free / 6 paid regions excluding deprecated" logic.
- `serialiseAssertions` / `headersToDbJson` — assertion / header
serialisation moved out of the tRPC router.
- `countMonitorsInWorkspace` for quota checks.
`listMonitors` does two IN queries (tags + incidents) regardless of
list size. `getMonitor` reuses the same path with the richer
`{ notifications, privateLocations }` toggle and a singleton. Same
pattern as status-report's batched fix.
- **tRPC** (`packages/api/src/router/monitor.ts`): every procedure is a
thin wrapper. `delete` catches `NotFoundError` for idempotency.
`new` / `updateGeneral` keep the `testHttp` / `testTcp` / `testDns`
pre-save check at the tRPC layer — services unconditionally save,
callers decide whether to pre-validate.
- Connect RPC handler at `apps/server/src/routes/rpc/services/monitor/`
— still uses its own helpers. Will be migrated in a follow-up
(4b).
- v1 REST endpoints at `apps/server/src/routes/v1/monitors/*` — yet
another external surface; dedicated PR later.
- `packages/api/src/service/import.ts` monitor writes — covered by
the plan's dedicated PR 9.
- Biome `noRestrictedImports` scope adds
`packages/api/src/router/monitor.ts`. The Connect monitor handler
stays out of scope until 4b.
- Subpath export `@openstatus/services/monitor`.
- `packages/services/src/monitor/__tests__/monitor.test.ts` —
create (http / tcp / dns), delete cascade, bulk delete, clone,
tags / notifiers validation (forbidden when cross-workspace),
list / get with workspace isolation + soft-delete hiding,
updateMonitorGeneral round-trip.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(services): declare @openstatus/regions + @openstatus/assertions deps
The monitor domain pulls `regionDict` from `@openstatus/regions` (for
the default-region picker) and assertion classes / validators from
`@openstatus/assertions`. Both were missing from
`packages/services/package.json`, so Vercel's Next.js build for
`apps/status-page` — which transitively imports the services package
via `@openstatus/api` — failed with "Module not found" on
`packages/services/src/monitor/internal.ts:20` and `schemas.ts:1`.
Local pnpm workspaces resolved the imports via hoisting, which masked
the missing declarations until an actual Next.js build forced strict
resolution.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(services/monitor): address Cubic review
- **`timeout` / `degradedAfter` bounds** (schemas.ts) — mirror the
0–60_000 ms cap from `insertMonitorSchema`. Values outside the range
are rejected before the UPDATE instead of being silently persisted.
- **`jsonBody` assertion mapping** (internal.ts) — the serialiser was
silently dropping `jsonBody`-typed input because the runtime class
wasn't wired up. Added the `JsonBodyAssertion` branch; the class has
existed in `@openstatus/assertions` the whole time, this was just a
missing case.
- **Clone resets `status`** (clone.ts) — cloning no longer inherits the
source's current `error`/`degraded` health. Freshly cloned monitors
start at `"active"` and settle on their first check.
- **Delete filters soft-deleted** (delete.ts) — the pre-check now
includes `isNull(monitor.deletedAt)`, so a repeat delete returns
`NotFoundError` (preserved as idempotent at the tRPC layer) instead
of re-running the cascades and emitting duplicate audits.
- **List/get workspace scope on relations** (list.ts) — `enrichMonitorsBatch`
takes `workspaceId` and scopes the incident / tag / notification /
private-location IN queries to the caller's workspace. Defence-in-depth
against inconsistent FK data (none of those tables enforce workspace
ownership at the FK level).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(services): migrate notification CRUD (PR 5/N) (#2106)
* feat(services): migrate notification CRUD onto service layer
Fifth domain. Migrates the four CRUD procedures — `list`, `new`,
`updateNotifier`, `delete` — of the notification tRPC router onto
`@openstatus/services/notification`. The three integration-helper
procedures (`sendTest`, `createTelegramToken`, `getTelegramUpdates`)
stay inline at the tRPC layer and are explicitly out of scope for this
migration.
- `createNotification` — enforces plan limits on both the channel count
(`notification-channels`) and the provider itself (`sms` / `pagerduty`
/ `opsgenie` / `grafana-oncall` / `whatsapp` require plan flags);
validates the loose `data` payload against `NotificationDataSchema`;
validates monitor ids are in-workspace and not soft-deleted.
- `updateNotification` — replaces name / data / monitor associations in
a single transaction with the same validation rules as create.
- `deleteNotification` — hard-delete (FK cascade clears associations).
The tRPC wrapper swallows `NotFoundError` to preserve the old
idempotent behaviour.
- `listNotifications`, `getNotification` — batched IN query enriches
monitors per notification. Monitor enrichment is workspace-scoped and
filters soft-deleted monitors for defence-in-depth.
- All mutations run inside `withTransaction`, emit `emitAudit`.
- **tRPC** (`packages/api/src/router/notification.ts`): `list` / `new` /
`updateNotifier` / `delete` become thin service wrappers. `sendTest`
+ `createTelegramToken` + `getTelegramUpdates` are unchanged.
- **`sendTest` migration** — the dispatch switch imports from 10
`@openstatus/notification-*` packages. Moving it into services would
pull those as direct deps; the plan's phrasing ("Channel CRUD +
test-dispatch") allows this as a later extraction.
- **`createTelegramToken` / `getTelegramUpdates`** — redis + external
Telegram API helpers; transport UX, not domain operations.
- **Biome scope for `notification.ts`** — the file still imports
`@openstatus/db/src/schema` for the `sendTest` provider data schemas.
Will land with the sendTest migration follow-up.
- **Connect RPC notification handler** — stays on its own helpers;
follow-up aligned with PR 4's Connect deferral.
- `__tests__/notification.test.ts` covers create (including
`ValidationError` on malformed data, `LimitExceededError` on gated
provider, `ForbiddenError` on cross-workspace monitor), update
(association replacement, cross-workspace `NotFoundError`), delete,
list/get workspace isolation + monitor enrichment scope.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(services/notification): address Cubic review
- **Update flow plan gate** (update.ts) — `updateNotification` was
skipping `assertProviderAllowed`, so a user who downgraded their plan
could still edit a notification configured with a now-restricted
provider. Re-check against the stored `existing.provider` to match
the create-time gate.
- **Provider / data match** (internal.ts) — `NotificationDataSchema` is
a union, so `{ provider: "discord", data: { slack: "…" } }` passed
the union check even though the payload key doesn't match the
provider. `validateNotificationData` now takes the provider and
asserts `provider in data` after the top-level parse. Applied in
both `create` and `update` (update uses the stored provider since
the API doesn't allow provider changes).
Added a test for the mismatch case.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(services/notification): tighten data validation against provider schema
Cubic's follow-up on the previous fix was right: checking only
`provider in data` isn't enough. `NotificationDataSchema` is a union,
so a payload like `{ discord: "not-a-url", slack: "valid-url" }` passes
because the union matches the slack variant — the extra `discord` key
is ignored, and my key-presence check sees `"discord"` and lets it
through.
Replaced the union parse + key check with a provider-specific schema
lookup (`providerDataSchemas[provider].safeParse(data)`). Each
canonical channel schema is keyed by its provider name and validates
the shape / content of the value, so the new check catches both the
mismatched-provider and malformed-payload cases in one pass.
Added a test covering the exact case Cubic flagged — invalid `discord`
URL alongside a valid `slack` URL now rejects with ValidationError.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(services): migrate page-component domain (PR 6/N) (#2107)
* feat(services): migrate page-component domain onto service layer
Sixth domain, tRPC-only. Migrates `list`, `delete`, and `updateOrder`
onto `@openstatus/services/page-component`.
- `listPageComponents` — workspace-scoped filter + optional pageId
filter. Batched enrichment in four IN queries (monitors, groups,
status reports via join, maintenances via join). All relation queries
scoped to the caller's workspace for defence-in-depth.
- `deletePageComponent` — hard-delete. Cascade clears the
`status_report_to_page_component` / `maintenance_to_page_component`
associations. The tRPC wrapper swallows `NotFoundError` to preserve
the pre-migration idempotent behaviour.
- `updatePageComponentOrder` — the complex one. Mirrors the existing
diff-and-reconcile pass faithfully (≈220 lines → a single transaction):
1. Assert the page is in the workspace.
2. Enforce the workspace's `page-components` plan cap.
3. Validate every monitor id in the input set.
4. Remove monitor components whose monitorId isn't in the input;
remove static components based on whether the input carries ids.
5. Clear `groupId` before dropping groups (FK safety), then recreate
groups.
6. Upsert monitor components via `onConflictDoUpdate` on the
`(pageId, monitorId)` unique constraint (preserves ids).
7. Update existing static components by id; insert new ones.
Audit: `page_component.update_order` / `page_component.delete`.
- **tRPC** (`packages/api/src/router/pageComponent.ts`): all three
procedures call services. `delete` catches `NotFoundError` and
returns the old `drizzle.returning()`-shaped empty array. The
pre-existing `pageComponent.test.ts` (tests cross-workspace monitorId
→ `TRPCError(FORBIDDEN)`) is untouched and still valid — my services
throw `ForbiddenError`, which `toTRPCError` maps to the same code.
- Biome `noRestrictedImports` scope adds
`packages/api/src/router/pageComponent.ts`.
- Subpath export `@openstatus/services/page-component`.
- `__tests__/page-component.test.ts` covers `updatePageComponentOrder`
happy path (creates monitor + static + grouped components), rejects
cross-workspace monitorId and cross-workspace pageId, `list`
workspace isolation, `delete` cross-workspace `NotFoundError`.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(services): Connect RPC notification handler catch-up (#2108)
* feat(services): Connect RPC notification handler onto services (catch-up)
Follow-up to PR 5 — noticed on review that my PRs from PR 4 onwards had
been narrowing scope to tRPC only and deferring Connect handlers, which
was piling up. This closes the notification Connect gap.
`apps/server/src/routes/rpc/services/notification/index.ts` — the five
CRUD methods now delegate to `@openstatus/services/notification`:
- `createNotification` → `createNotification` service (handles the
plan-count limit, per-provider plan gate, and data-schema validation
internally — the Connect-side `checkNotificationLimit` /
`checkProviderAllowed` / `validateProviderDataConsistency` calls are
gone).
- `getNotification`, `listNotifications`, `updateNotification`,
`deleteNotification` — thin proto-to-service-to-proto wrappers.
- `updateNotification` reads the existing record via the service and
fills in missing fields (Connect's update is partial; the service
expects a full payload), then applies the update.
Left inline:
- `sendTestNotification` — calls `test-providers.ts` (external HTTP).
- `checkNotificationLimit` RPC method — returns the count info via
`./limits.ts` helpers (pure queries, no domain mutation).
The local Connect helpers (`validateProviderDataConsistency`,
`checkNotificationLimit`, `checkProviderAllowed`, and the ad-hoc
`validateMonitorIds` / `updateMonitorAssociations` / `getMonitorById` /
`getMonitorCountForNotification` / `getMonitorIdsForNotification`) are
no longer imported by `index.ts`; they remain in their files because
`test-providers.ts` and the unmigrated Connect monitor handler still
reference some of them.
Added `apps/server/src/routes/rpc/services/notification/index.ts` to
the `noRestrictedImports` scope. The directory-level glob isn't a fit
because `limits.ts` and `test-providers.ts` legitimately need direct
db access until their own follow-up migrations.
- **Connect monitor handler** (~880 lines, 6 jobType-specific
create/update methods + 3 external-integration methods) — requires a
much bigger refactor. Flagged as dedicated PR 4b; tracked separately.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(services): dedupe monitor ids in Connect createNotification response
Cubic's P2 catch: the service dedupes `monitors` before the insert
(via `validateMonitorIds` in the services package), but the Connect
handler echoed `req.monitorIds` verbatim back in the response. For an
input like `["1", "1", "2"]` the DB stored `[1, 2]` while the response
claimed `["1", "1", "2"]` — caller state diverges from persistence.
Echo `Array.from(new Set(req.monitorIds))` instead so the response
matches what's actually stored.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(services): migrate page authoring (PR 7/N) (#2109)
* feat(services): migrate page (status-page authoring) onto service layer
Seventh domain. Migrates the 13 authoring procedures in
`pageRouter` onto `@openstatus/services/page`. Deliberately scoped to
authoring CRUD only:
- `statusPage.ts` — public viewer endpoints (subscribe / get / uptime /
report / verify / unsubscribe) are a separate surface that doesn't
use the authenticated `ServiceContext`; dedicated follow-up.
- Connect `apps/server/src/routes/rpc/services/status-page/**` — ~1500
lines with 18 methods (page CRUD + components + groups + subscribers
+ view). Too big for this PR; dedicated follow-up, same shape as the
Connect monitor deferral.
- `createPage` / `newPage` — full vs minimal create; both enforce the
`status-pages` plan cap and (for `createPage`) the per-access-type
plan gates (password-protection, email-domain-protection, ip-
restriction, no-index).
- `deletePage` — FK cascade clears components / groups / reports /
subscribers.
- `listPages` — batched enrichment with `statusReports`.
- `getPage` — enriched with `maintenances` / `pageComponents` /
`pageComponentGroups`.
- `getSlugAvailable` — pure check against `subdomainSafeList` + DB.
- `updatePageGeneral` — with slug-uniqueness re-check on change.
- `updatePageCustomDomain` — persists the DB change and returns the
previous domain so the caller can diff. Vercel add/remove stays at
the tRPC layer (external integration).
- `updatePagePasswordProtection` — re-applies the same plan gates
the `create` path uses.
- `updatePageAppearance`, `updatePageLinks`, `updatePageLocales`
(gated on `i18n` plan flag), `updatePageConfiguration`.
- Audit action emitted for every mutation.
All 13 procedures are thin wrappers. `delete` catches `NotFoundError`
for idempotency. `updateCustomDomain` orchestrates:
1. `getPage` (via service) to read the existing domain.
2. `addDomainToVercel` / `removeDomainFromVercel` as needed.
3. `updatePageCustomDomain` (via service) to persist.
- Biome scope adds `packages/api/src/router/page.ts`. The router
imports `insertPageSchema` via the services re-export
(`CreatePageInput`) so the db-import ban applies cleanly.
- Subpath export `@openstatus/services/page`.
- `__tests__/page.test.ts` covers `newPage` happy / reserved /
duplicate, `createPage` monitor attachment + cross-workspace monitor,
`updatePageGeneral` rename + duplicate-slug conflict + cross-workspace,
`updatePageLocales` plan gate, list / get / slug-available workspace
isolation, delete cross-workspace NotFoundError.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(services): migrate Connect status-page page CRUD onto services
Extends PR #2109 to cover the Connect RPC status-page handler's page
CRUD surface (create / get / list / update / delete), matching the
migration that landed for tRPC's `pageRouter`. The other 13 methods
(components, groups, subscribers, viewer) still read the db directly —
they're separate domains that'll need their own services in follow-ups.
- create / get / delete call into `@openstatus/services/page` and
preserve the granular Connect errors (`statusPageNotFoundError`,
`slugAlreadyExistsError`) by pre-checking before the service call or
catching `NotFoundError` → re-throwing the richer variant.
- list fetches via the service and paginates in-memory; status-page
quota is bounded per workspace so the extra enrichment is negligible.
- update loads the existing page via the service, then orchestrates the
per-section updates (`updatePageGeneral`, `updatePageLinks`,
`updatePageAppearance`, `updatePageCustomDomain`, `updatePageLocales`,
`updatePagePasswordProtection`) inside a shared transaction so a
partial failure can't leave the page half-updated. Each service's
internal `withTransaction` detects the pre-opened tx and skips
nesting.
- Proto-specific format validations (https icon URL, custom-domain
regex, IPv4 CIDR, email-domain shape) and the i18n PermissionDenied
path stay at the handler — they don't exist in the zod insert schema
and their error codes would change if deferred to the service.
- `Page` from the service parses `authEmailDomains` / `allowedIpRanges`
into arrays, while the converters (still used by the unmigrated
methods) expect the comma-joined string form. `serviceToConverterPage`
bridges the two shapes at the call sites that need it.
Biome scope deliberately unchanged: the file still imports from
`@openstatus/db` for the 13 legacy methods, so the override would
light up the whole file.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(services/page): address Cubic review on #2109
Four issues flagged across two Cubic reviews:
- `createPage` skipped `assertSlugAvailable`, so full-form creates
could bypass reserved/duplicate slug validation and either create a
duplicate or fail late on a DB constraint instead of the clean
`ConflictError`. Added the check alongside the existing quota gate.
- `createPage` passed `passwordProtected` / `allowedIpRanges` but not
`allowIndex` to `assertAccessTypeAllowed`, bypassing the `no-index`
plan gate on create. Now forwarded.
- `UpdatePagePasswordProtectionInput.allowedIpRanges` accepted arbitrary
strings. Mirrored the CIDR validation from `insertPageSchema` — bare
IPs get `/32` appended, everything pipes through `z.cidrv4()`.
- `updatePagePasswordProtection` wrote `authEmailDomains:
input.authEmailDomains?.join(",")`, which evaluates to `undefined`
when the caller clears the field. Drizzle treats `undefined` as
"skip this column" on `.set()`, so stale email domains survived an
access-type switch. Added the `?? null` fallback to match the
neighboring `allowedIpRanges` line. This fixes the Connect
`updateStatusPage` path where switching away from AUTHENTICATED sets
`nextAuthEmailDomains = undefined` expecting the column to clear.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(services): migrate workspace / user / invitation / api-key (PR 8/N) (#2110)
* feat(services): migrate workspace / user / invitation / api-key (PR 8/N)
Stacked on PR #2109. Eighth migration — four small domains consolidated into
one PR because each is narrow (roughly two to five procedures) and they
share no structural dependencies beyond already-migrated infrastructure.
**workspace** — `getWorkspace`, `getWorkspaceWithUsage` (pages + monitors +
notifications + page-components batched via drizzle relations),
`listWorkspaces` (takes `userId` explicitly since `list` runs across every
workspace the user has access to), `updateWorkspaceName`.
**user** — `getUser` (active, non-soft-deleted), `deleteAccount` (the paid-
plan guardrail stays; removes non-owned memberships, sessions, OAuth
accounts and blanks the PII columns inside a single tx).
**invitation** — `createInvitation` (plan gate counts pending invites
against the members cap so two outstanding invites can't both accept past
the limit), `deleteInvitation`, `listInvitations`, `getInvitationByToken`
(scoped by token **and** accepting email to prevent token-sharing),
`acceptInvitation` (stamps acceptedAt + inserts membership atomically).
**api-key** — `createApiKey` (returns plaintext token once), `revokeApiKey`
(workspace-scoped existence check inside the tx so concurrent revokes
resolve to a consistent NotFound rather than a silent no-op),
`listApiKeys` (replaces the legacy per-row `Promise.all` fan-out with a
single IN query for creator enrichment), `verifyApiKey` +
`updateApiKeyLastUsed` (no ctx required — the verify path runs before
workspace resolution and callers pass an optional `db` override).
All 14 procedures become thin `try { return await serviceFn(...) } catch
{ toTRPCError }` wrappers. Router shapes stay identical so the dashboard
needs no changes. Connect + Slack don't expose these domains today;
migrating their consumers is a follow-up.
Biome `noRestrictedImports` override adds the four router files. Subpath
exports `@openstatus/services/{workspace,user,invitation,api-key}` added
to the services package.
Deletes `packages/api/src/service/apiKey.ts` and its tests — fully
superseded by `packages/services/src/api-key/`. The auth middleware in
`apps/server` has its own inline apiKey verification and is unaffected.
- **`domain.ts`** — pure Vercel-API proxy with no DB usage; not part of
the migration surface. Stays as-is.
- **`packages/api/src/service/{import,telegram-updates}.ts`** — import
migration is PR 9; telegram-updates stays for a follow-up.
Per-domain `__tests__/*.test.ts` covers: workspace rename + audit, usage
counts, members cap hit on free plan, invitation token-mismatch rejection,
accept idempotency, api-key creation returning a bcrypt hash, list creator
enrichment, revoke NotFoundError on unknown ids, verifyApiKey happy / bad-
format / wrong-body paths, lastUsed debounce.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(services): address Cubic review on #2110
Four issues flagged on PR 8:
- **P1 — `invitation/accept.ts`**: the read-then-write pattern let two
concurrent accepts both pass the `isNull(acceptedAt)` check and race
through the membership insert. Replaced with a conditional UPDATE that
re-asserts `isNull(acceptedAt)` in the WHERE clause and checks
`.returning()` rowcount. The loser gets `ConflictError`, the tx aborts
before membership inserts run.
- **P2 — `api-key/create.ts`**: `createdById` was taken from input and
the router spliced in `ctx.user.id`. Since that column is attribution
data (who owns the key, who the audit row blames), trusting input
would let any caller forge ownership. Derived from `ctx.actor` via
`tryGetActorUserId`; actors without a resolvable user id (system /
webhook / unlinked api-key) now get `UnauthorizedError` instead of a
silent NULL write. `createdById` removed from the input schema.
- **P2 — `invitation/delete.ts`**: audit row was emitted even when the
DELETE matched zero rows (unknown id / wrong workspace). Switched to
`.returning({ id })` and short-circuit before the audit emit so the
log only reflects actual deletions.
- **P2 — `invitation/list.ts`**: the `if (!input.email)` →
`UnauthorizedError` branch in `getInvitationByToken` was unreachable
because `z.email()` already rejects empty / malformed emails at
`.parse()`. Removed the dead branch; the router keeps its own
pre-call check for `ctx.user.email`, so the transport-level
UnauthorizedError path is preserved.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(services): migrate import domain (PR 9/N) (#2111)
* feat(services): migrate import domain (PR 9/N)
Stacked on PR #2110. Ninth and final domain — lifts the ~1,000-line
`packages/api/src/service/import.ts` orchestrator into the services
package as its own `@openstatus/services/import` domain.
Split into focused files:
- **`schemas.ts`** — `PreviewImportInput` / `RunImportInput` zod. Provider
discriminator + per-provider page-id fields live here; options schema
is separately exported for callers that want to pre-validate.
- **`provider.ts`** — `createProvider` factory + `buildProviderConfig`
reshape helper, isolated from the orchestrator so adding a provider is
a one-file change.
- **`limits.ts`** — `addLimitWarnings` (shared by preview + run). Pure
mutation on the `ImportSummary` argument; no writes.
- **`utils.ts`** — `clampPeriodicity` + `computePhaseStatus` helpers.
- **`phase-writers.ts`** — the seven phase writers (page / component
groups / components / incidents / maintenances / monitors /
subscribers). Each takes a `DB` explicitly so callers can thread a
pre-opened tx; failing resources get `status: "failed"` with an error
string rather than throwing.
- **`preview.ts`** — dry-run only; validates credentials, runs the
provider with `dryRun: true`, emits warnings.
- **`run.ts`** — the orchestrator. Now owns the `pageId` ownership
check (previously duplicated in the tRPC router) and emits exactly
**one** `import.run` audit row regardless of outcome so partial /
failed runs still show up in the audit signal. Deliberately *not*
wrapped in `withTransaction` — imports can span minutes across dozens
of writes and the existing UX is phase-level recovery.
124 lines → 28 lines. The router is now a thin `previewImport` /
`runImport` wrapper; the input schemas and all validation live in the
service. The router-level `TRPCError`-throwing `pageId` ownership check
moved into `runImport` so non-tRPC callers (Slack / future) get the
same guard.
- Provider validation failure: `TRPCError("BAD_REQUEST")` →
`ValidationError` → `TRPCError("BAD_REQUEST")`. Net-same.
- Unknown / wrong-workspace `pageId`: `TRPCError("NOT_FOUND")` →
`NotFoundError` → `TRPCError("NOT_FOUND")`. Net-same.
- Unit tests for `addLimitWarnings` / `clampPeriodicity` /
`computePhaseStatus` move to `packages/services/src/import/__tests__/`.
- Router integration tests (`packages/api/src/router/import.test.ts`)
that previously called `previewImport` / `runImport` directly to
override workspace limits now route through `makeCaller(limitsOverride)`
with an explicit `provider: "statuspage"` field. This also fixes four
pre-existing TypeScript errors where those calls were missing the
(required) provider discriminator.
- Biome `noRestrictedImports` override adds `packages/api/src/router/import.ts`.
- Subpath export `@openstatus/services/import` added.
- `@openstatus/importers` added to services deps; services `tsconfig.json`
bumped to `moduleResolution: "bundler"` so the importers package-exports
map resolves (same setting `packages/api` already uses).
Deletes `packages/api/src/service/import.ts` (1042 lines) and its test
file (463 lines). Only `telegram-updates.ts` remains in
`packages/api/src/service/` — that's slated for a follow-up PR.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(services/import): per-resource audit + Cubic fixes on #2111
Two changes folded together:
Every phase writer now emits one `emitAudit` row per *created*
resource, matching what the domain services emit for normal CRUD:
| Phase | Audit action
| --- | ---
| page | `page.create`
| componentGroups | `page_component_group.create`
| components | `page_component.create`
| monitors | `monitor.create`
| incidents | `status_report.create` + `status_report.add_update` per update
| maintenances | `maintenance.create`
| subscribers | `page_subscriber.create`
Skipped resources don't emit (their original create audit already
exists); failed resources don't emit (nothing was written); link-table
rows (statusReportsToPageComponents etc.) don't emit (edges, not
entities). Metadata always carries `source: "import"` + `provider:
<name>` + `sourceId: <provider-id>` so the audit trail traces back to
the source system.
The rollup `import.run` audit still fires at the end — the per-resource
rows give forensic granularity, the run-level row gives "this bulk
operation happened" without scanning the full summary blob.
For the change, phase writers now take a shared `PhaseContext = { ctx,
tx, provider }` instead of `(db, workspaceId, limits)` — the orchestrator
builds one `PhaseContext` per run and threads it through, giving each
writer access to `ctx.actor` for audit attribution. `statusReportUpdate`
writes now use `.returning({ id })` so the per-update audit can
attribute the right row.
- **`run.ts:130`** — phases after `page` kept their provider-assigned
status when `targetPageId` was falsy but the user option wasn't
`false`. Replaced the narrow `else if (option === false)` branches
with a plain `else → phase.status = "skipped"`, matching what
`subscribers` already did.
- **`run.ts:147`** — when the `components` phase hit `remaining <= 0`,
the phase was marked `"failed"` but individual resource statuses were
left stale with no error string. Each resource is now marked
`"skipped"` with `"Skipped: component limit reached (N)"`, matching
`writeMonitorsPhase`. Phase-level status becomes `"skipped"` too
(was `"failed"` — failed implied a writer error, this is really a
plan-limit pre-check).
- **`provider.ts`** — both `createProvider` and `buildProviderConfig`
had a `default:` that silently ran the Statuspage adapter for any
unknown provider name, which would mask a typo by handing a non-
Statuspage api key to the wrong adapter. Replaced with exhaustive
`case "statuspage"` + `never`-typed default throw.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* chore(services): rename rpc/services → rpc/handlers (PR 10/N) (#2112)
The symbolic deliverable from the plan's "close the loop" PR. Renames
`apps/server/src/routes/rpc/services/` → `apps/server/src/routes/rpc/handlers/`
so the distinction between "the services layer" (owns business logic,
lives in `packages/services`) and "Connect transport handlers" (thin
proto → service → proto wrappers) is permanent and visible in the path.
Keeping the old name invites the next developer to "just add one small
thing" to a file under a `services/` folder months later; the rename
makes the layering explicit.
- `git mv` of the six domain subdirectories + their tests
(health / maintenance / monitor / notification / status-page /
status-report).
- `router.ts` import paths updated from `./services/*` to `./handlers/*`.
- Biome `overrides.include` paths updated to the new location.
- Added `apps/server/src/routes/rpc/handlers/health/**` to the scope —
the health handler has no db usage today; including it locks in that
invariant.
Rather than pretending the full "close the loop" deliverable is possible
today, the biome.jsonc comment now enumerates exactly what remains
unmigrated:
- `packages/api/src/router/statusPage.ts` — public viewer endpoints
under `publicProcedure`, no authed `ServiceContext`.
- `packages/api/src/router/{member,integration,monitorTag,
pageSubscriber,privateLocation,checker,feedback,stripe,tinybird,
email}.ts` — small domains not yet lifted.
- `apps/server/src/routes/rpc/handlers/monitor/**` — 6 jobType-specific
methods still on db.
- `apps/server/src/routes/rpc/handlers/status-page/**` — page CRUD is
migrated (PR 7), but components / groups / subscribers / viewer (13
methods) still import db, so the whole file stays out of scope.
- `apps/server/src/routes/v1/**` — the public HTTP API surface.
- `apps/server/src/routes/slack/**` except `interactions.ts` — tools,
handler, oauth, workspace-resolver still on db.
- `apps/server/src/routes/public/**` — public-facing HTTP routes.
Each of the above is its own PR-sized migration. The final consolidation
(broadening to `router/**` + dropping `@openstatus/db` from
`packages/api` and `apps/server`) is conditional on all of them
landing first.
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(services/import): use ctx workspaceId for page insert
`writePagePhase` was inserting with `data.workspaceId` — the value the
provider package round-tripped into resource data. Every other phase
writer (monitor / components / subscriber) already reads `workspaceId`
from `ctx.workspace.id`; this lines the page insert up with that
pattern. Defends against the (unlikely) case where a provider mapper
serialises the wrong workspace id into its output, since `ctx` is the
authoritative source.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(services): address Claude review findings on #2110
Six findings from Claude's review pass — five code/doc fixes, one
documentation-only note.
**P2 — `acceptInvitation` derives userId from `ctx.actor`.**
Was taking it from input: the email scoped *which* invitation could
be accepted, but not *who* the membership was inserted for. A caller
with the right token+email could insert a membership under an
arbitrary user id. Removed `userId` from `AcceptInvitationInput`;
derived from `tryGetActorUserId(ctx.actor)`, throws
`UnauthorizedError` for non-user actors. Mirrors the same pattern
applied to `createApiKey.createdById` in the Cubic pass. Router and
test updated accordingly.
**P2 — `getWorkspace` throws `NotFoundError` explicitly.**
`findFirst` + `selectWorkspaceSchema.parse(undefined)` was throwing
`ZodError` (→ `BAD_REQUEST`) instead of the `NotFoundError` shape
every other service uses. Unreachable in practice (ctx.workspace is
resolved upstream) but the error shape was the only outlier;
consistency matters for callers pattern-matching on error codes.
**P3 — `listApiKeys` filters null `createdById` before the IN query.**
The new `createApiKey` path enforces a non-null creator, but legacy
rows may have null. SQL's `x IN (NULL)` is `UNKNOWN` — technically
safe — but drizzle types model the array as `number[]`. Filtering
upfront keeps the types honest and sidesteps any future surprise.
**P3 — `deleteInvitation` guards `acceptedAt IS NULL`.**
The WHERE previously allowed hard-deleting *accepted* invitations,
wiping the "user was invited on X" breadcrumb. Added the
`isNull(acceptedAt)` guard + doc comment explaining the audit-trail
preservation intent.
**Doc-only — `deleteAccount` orphan comment.**
Non-owner memberships are removed, but owner memberships + owned
workspaces survive. Matches legacy behavior. Added a scope-note
docblock flagging that workspace cleanup is explicitly out of scope
(belongs to a future admin / scheduled job).
**Doc-only — `createInvitation` role comment.**
The invite insert lets `role` fall through to the schema default
(`member`). Matches legacy (which also only picked `email`).
Comment added so the absence reads as deliberate rather than
overlooked.
Minor — the concurrent-accept race test is covered by the conditional
UPDATE + `ConflictError` path from the earlier P1 fix; mocking it
reliably against SQLite is noisy and not worth the test complexity.
Documented in the related code comment.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(services): address Claude re-review findings on #2110
Four issues surfaced after the first round of fixes on this PR:
**P2 — `listApiKeys` crashes on all-legacy keys.**
After the null filter added in the previous commit, workspaces whose
keys all pre-date the services migration (every `createdById` null)
end up with `creatorIds === []`. Drizzle throws "At least one value
must be provided" on an empty `inArray`, taking the whole endpoint
down. Added an early return that maps `createdBy: undefined` when
there are no non-null creator ids to look up.
**P2 — `getWorkspaceWithUsage` ZodError on missing row.**
Same `findFirst` + `selectWorkspaceSchema.parse(result)` pattern as
`getWorkspace`, but without the `NotFoundError` guard that got added
in the earlier pass. Added the guard. Also cleaned up the usage
block — no longer needs optional chaining once the narrowing fires.
**P2 — `deleteAccount` took `userId` from input.**
Completing the `createApiKey` / `acceptInvitation` pattern: account
deletion must target `ctx.actor`, never an arbitrary id. Dropped
`userId` from `DeleteAccountInput` (now an empty forward-compat
shape), derived inside the service via `tryGetActorUserId`, throws
`UnauthorizedError` for non-user actors. Router updated to stop
passing it.
**P3 — `createInvitation` dev-token log could leak in tests.**
Tightened the comment around the `process.env.NODE_ENV === "development"`
guard to flag that strict equality is load-bearing — bun:test sets
`NODE_ENV=test` and CI leaves it undefined, both of which correctly
skip the log. No behavior change, just a clearer contract so the
next reader doesn't loosen it.
Cubic's two findings on this review pass point at `packages/api/src/
router/import.ts` and `packages/services/src/import/limits.ts` — both
live in the next PR up the stack (#2111 / feat/services-import) and
will be addressed there.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(services): address latest Cubic pass on #2110
Four findings from the third Cubic review (now that #2111's import
domain is included in the #2110 diff via the stack):
**P2 — biome.jsonc notification handler scope.**
Only `notification/index.ts` was in the `noRestrictedImports` override.
Sibling files (`errors.ts`, `test-providers.ts`) were outside the
migration guard, so new db imports could land in them without the
lint failing. Broadened to `notification/**` and moved the two files
that *legitimately* still read db (`limits.ts` querying workspace
quotas, `converters.ts` needing db enum shapes for proto round-trip)
into the `ignore` list. Future siblings are enforced by default
rather than silently slipping through.
**P2 — `clampPeriodicity` unknown values returned too fast.**
`PERIODICITY_ORDER.indexOf("unknown") === -1` → `Math.max(-1, 0) === 0`
→ walk started at `"30s"` (the fastest tier). Could return an
interval faster than requested, violating the
"never-faster-than-requested" invariant. Short-circuits now to the
slowest allowed tier when the requested value isn't a known
periodicity. Added unit tests covering the unknown-value + empty-
allowed fallback paths.
**P2 — component/monitor limit warnings counted total resources, not
quota-consuming inserts.**
If the import contained 4 components and 3 already existed (would be
skipped as duplicates), the warning claimed `"Only X of 4 can be
imported"` — but actually zero quota would be consumed by the 3
skips, so the real new-creation count might fit entirely. Reworded
to `"Only N new components may be created … some of the M in the
import may already exist and be skipped"`. Same treatment for the
monitors warning. Preview stays DB-light (no per-resource existence
checks); the warning now honestly conveys worst-case without
misleading users about what will actually happen. Test assertions
updated to match the new wording with substring matches that aren't
tied to the exact fraction.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(services/page): address Claude review on #2109
Six items from Claude's review, going with the calls I leaned toward
in the question-back:
**P2 — tRPC `updateCustomDomain` wasteful `getPage` read.**
Was calling `getPage(id)` (fires 3 batched relation queries:
maintenances + components + groups) just to grab `customDomain`
before the Vercel add/remove calls. Added a narrow
`getPageCustomDomain` service helper — single indexed lookup,
workspace-scoped, returns the string directly. Router swapped over.
Service-layer authority preserved; no db reads leak into the router.
**P2 — Connect `updateStatusPage` slug-race code drift.**
Handler pre-checks slug to surface `slugAlreadyExistsError`
(`Code.AlreadyExists`). The `updatePageGeneral` service call
re-validates via `assertSlugAvailable` → `ConflictError` →
`Code.InvalidArgument` in the race where two callers both clear the
pre-check. Wrapped the call in `try/catch (ConflictError)` and
rethrow as `slugAlreadyExistsError(req.slug)` so gRPC clients keying
on the code get a consistent `AlreadyExists` whether they lose at
the pre-check or at the inner tx.
**P2 — Connect `createStatusPage` / `updateStatusPage` customDomain
without Vercel sync.** Pre-existing behaviour (the direct-db handler
had the same gap). Added a top-of-impl comment so it doesn't go
unnoticed — the fix is a shared transport-layer helper the Connect
handlers can reuse, out of scope for this migration PR to keep the
behavioural blast radius small for external API consumers.
**P3 — double cast `row as unknown as Page` in `create.ts`.**
The drizzle insert-returning type and the `Page` type diverge on
`authEmailDomains` / `allowedIpRanges` (raw comma-joined string vs
parsed `string[]`). Replaced the double casts with
`selectPageSchema.parse(row)` which normalises the row into the shape
callers expect. Cast-drift is now impossible to introduce silently.
**P3 — `void ConflictError;` workaround.**
Import was unused in `create.ts`; the `void` line was silencing the
unused-import warning rather than fixing the cause. Removed both.
**P3 — deprecated `passwordProtected` column.**
Added a doc block on `updatePagePasswordProtection` flagging that
the deprecated boolean column is intentionally not written here (the
v1 REST read path derives it from `accessType` via
`normalizePasswordProtected`). Prevents a future reader from
mistaking the omission for an oversight and writing two sources of
truth for the same signal.
Test coverage for the 5 untested update services
(`updatePagePasswordProtection`, `updatePageCustomDomain`,
`updatePageAppearance`, `updatePageLinks`, `updatePageConfiguration`)
deferred to a follow-up per Claude's "not blocking" marker — the
failing-edge behaviour is the critical bit, and
`updatePagePasswordProtection` already has indirect coverage through
the Connect handler tests on this branch.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(services): address Claude review on #2108
Four items from Claude's review of the Connect notification handler
backfill:
**P3 — `protoDataToServiceInput` swallowed parse failures.**
`try { JSON.parse } catch { return {} }` was hiding any malformed
output from `protoDataToDb` (which would be a programmer error, not
user-input) behind a generic empty-object fallback. The downstream
`validateNotificationData` then failed with a far less specific
error. Let the throw propagate — `toConnectError` maps it to
`Code.Internal`, which is the signal we want for "the helper itself
misbehaved."
**P3 — `createNotification` response approximated the monitor IDs.**
Was echoing `Array.from(new Set(req.monitorIds))` on the happy path
(correct, since the service validates + throws on invalid) but the
approximation diverged from `updateNotification`'s re-fetch pattern.
Now re-fetches via `getNotification` after create so the response
reflects what's actually in the DB — one extra IN query per create,
eliminates the approximation entirely, makes both handlers
structurally identical.
**P3 — `sendTestNotification` bypassed `toConnectError`.**
Only handler in the impl without a `try { … } catch { toConnectError
}` wrap, so any thrown `ServiceError` / `ZodError` from
`test-providers.ts` fell through to the interceptor's generic catch
and surfaced with a less precise gRPC status. Wrapped for symmetry.
**P3 — `JSON.parse(existing.data)` null-unsafe.**
Drizzle infers `notification.data` as `string | null` (the column has
`default("{}")` but no `.notNull()`). A legacy row with `NULL` in the
column would crash `updateNotification` with `SyntaxError` during
the partial-update read-modify-write. Added `?? "{}"` fallback and a
comment pointing at the schema.
Cubic's single finding from the earlier pass (dedupe of
`req.monitorIds` in the create response) was already applied in
`b69ad13` and has now been superseded by the re-fetch above.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(services): address latest Cubic pass on #2108
Five findings from Cubic's second review cycle on this PR, all on
files that entered this branch via the #2109 (status-page) and #2111
(import) squash-merges stacked on top. Fixing here so the cumulative
state reaching main is clean.
**P1 — `page/create.ts` double-encoded JSON configuration.**
`page.configuration` is a drizzle `text("…", { mode: "json" })`
column — drizzle serialises objects automatically. Calling
`JSON.stringify(configuration)` first stored a raw JSON string in
the column, breaking any downstream read that expects an object
(e.g. the appearance merge at `update.ts:185`). Dropped the wrap;
drizzle handles it.
**P2 — `page/schemas.ts` slug + customDomain validation weaker than
insert schema.**
`NewPageInput.slug`, `GetSlugAvailableInput.slug`, and
`UpdatePageGeneralInput.slug` were `z.string().toLowerCase()` — no
regex, no min-length. `UpdatePageCustomDomainInput.customDomain`
was `z.string().toLowerCase()` — no format check. Meant the service
would accept malformed slugs / URLs that `createPage` would then
reject via `insertPageSchema`, or — worse — that `getSlugAvailable`
would confidently return "available" for garbage. Exported the
canonical `slugSchema` + `customDomainSchema` from
`@openstatus/db/src/schema/pages/validation` and reused them across
all four service inputs; db validation is now the single source of
truth for page slug/domain shape.
**P2 — `api/router/import.ts` nullish → optional contract narrowing.**
The service's `PreviewImportInput`/`RunImportInput` used `.optional()`
for the three provider page-id fields, which dropped the `null`
acceptance the legacy router had via `.nullish()`. Existing clients
sending `null` would have started hitting `Invalid input` errors
after the import migration landed. Added a `nullishString` transform
in the service schema that accepts `string | null | undefined` and
normalises to `string | undefined` before it reaches
`buildProviderConfig` — callers keep the broader contract, service
internals stay ignorant of `null`.
**P2 — `page/update.ts` empty array stored "" not null.**
`authEmailDomains?.join(",") ?? null` coerces `null`/`undefined` to
`null`, but `[].join(",")` returns `""` (empty string) which `??`
treats as a value. Callers sending `authEmailDomains: []` to clear
the column were persisting the empty string instead of nulling it —
misleading "present but blank" state. Switched to `|| null` on both
array-join outputs (`authEmailDomains` + `allowedIpRanges`) so the
three clearing inputs — `undefined`, `null`, `[]` — all land on DB
`NULL` while real non-empty joins pass through unchanged.
Test fixtures already use slugs ≥ 3 chars that match the regex, so
the tightened validation doesn't break any existing assertions.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(services/page-component): address Cubic + Claude review on #2107
Three fixes + one test, addressing both the original Cubic finding
and Claude's re-review pass.
**P2 — Discriminated union for `componentInput`.**
The flat `z.object` with `type: z.enum(["monitor", "static"])` and
optional `monitorId` let callers submit a "monitor" component with
no monitor id, or a "static" one with a monitor id attached. The DB
catches it with a `CHECK` constraint, but that surfaces as an opaque
SQLite CHECK failure instead of a clean `ZodError` at the service
boundary. Replaced with a `z.discriminatedUnion("type", [...])` that
requires `monitorId` on the "monitor" arm and omits it on the
"static" arm.
Fallout in `update-order.ts`: `c.monitorId` no longer exists on the
"static" arm after narrowing, so the spreads now use
`monitorId: c.type === "monitor" ? c.monitorId : null`. The defensive
`&& c.monitorId` guards on the already-narrowed monitor branches are
gone (TypeScript enforces the invariant the DB was catching late).
**P2 — Sequential group insert instead of bulk `.returning()`.**
The bulk insert relied on drizzle/SQLite returning rows in the same
order they were inserted, so `newGroups[i]` could line up with
`input.groups[i]` when mapping components to their groups. True on
Turso today, but an implicit coupling — any driver change, batch
split, or upstream sort could silently reorder rows and land
components in the wrong group with no error signal. Switched to a
loop that captures each group id before moving on; the set size is
bounded by the status-page component-group plan cap so the extra
round trips are a rounding error.
**Nit — removed dead `hasStaticComponentsInInput` guard.**
Both the "input has static components but none carry ids" and "input
has no static components at all" branches collapsed to the same
"drop all existing static components" action, so the outer
`hasStaticComponentsInInput` conditional was doing no work. Dropped
the variable and the nested branch.
**Test — upsert idempotency.**
The `onConflictDoUpdate` on `(pageId, monitorId)` was the riskiest
untested path — a regression would silently insert duplicate rows on
every re-invocation. Added a test that calls
`updatePageComponentOrder` twice on the same page with the same
`monitorId`, then asserts there's exactly one matching row and the
second call's values won.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(services): address latest Cubic pass on #2107 + unblock build
Eight Cubic findings from the second review plus one dashboard build
break from my earlier discriminated-union change.
**Build — router shape diverged from service discriminated union.**
`packages/api/src/router/pageComponent.ts` kept its own flat
`z.object({...})` input schema with `type: z.enum(["monitor",
"static"])` and `monitorId: z.number().nullish()`. After the service
switched to `z.discriminatedUnion("type", [...])`, TS couldn't
reconcile the two — dashboard build failed. Replaced the local
schema with the service's exported `UpdatePageComponentOrderInput`
so both layers share the canonical shape.
**P1 — page router: validate customDomain before Vercel call.**
The router input was `z.string().toLowerCase()` (no format check) and
the service's `customDomainSchema` only fired inside
`updatePageCustomDomain`, *after* the Vercel add/remove mutations.
A malformed domain could be added to Vercel, then rejected by the
service, leaving Vercel/db state drifted. Switched the router input
to the service's `UpdatePageCustomDomainInput` so format validation
runs at tRPC input parsing, before any Vercel call.
**P1 — `listApiKeys` leaked `hashedToken`.**
`SELECT *` returned every column including the bcrypt hash of each
key's one-time token, which has no business appearing in a list
response. Replaced with an explicit column select that omits
`hashedToken`. New `PublicApiKey` type (`Omit<ApiKey,
"hashedToken">`) is the return shape; exported from the barrel.
**P2 — `acceptInvitation` eager workspace load + second fetch.**
The initial `findFirst` already loaded the workspace via `with: {
workspace: true }`, but the return value re-fetched it by id. Use
the joined value directly — one round-trip instead of two, and
eliminates the read-skew window where a just-renamed workspace
could appear with a different name in each fetch.
**P2 — `import.run` audit entityId 0.**
`entityId: targetPageId ?? 0` wrote a ghost `page 0` reference to
the audit trail when the page phase failed before producing an id.
Entity attribution now falls back to the workspace (`entityType:
"workspace"`, `entityId: ctx.workspace.id`) when no target page is
in play — real rollback signal, no phantom foreign key.
**P2 — `page-components` limit scoped per-page, not workspace.**
`page-components` is a workspace-wide cap (see
`page-component/update-order.ts` — counts every component across
every page). The import preview and run's component check were
scoping the existing count to `targetPageId`, which understated
pressure and would let imports push past the cap at write time.
Both sites now count workspace-wide.
**P2 — `writeIncidentsPhase` lacked idempotency.**
Every other phase writer checks for an existing row before
inserting (page by slug, monitor by url, component by name,
subscriber by email); `writeIncidentsPhase` inserted
unconditionally. A re-run would duplicate status reports on every
pass. Added an existence check by `(title, pageId, workspaceId)`
matching the convention.
**P2 — `writeMaintenancesPhase` lacked idempotency.**
Same pattern. Added a check by `(title, pageId, from, to,
workspaceId)` — the `from/to` pair is load-bearing because
maintenance titles recur ("DB upgrade") across unrelated windows.
**P2 — `writeComponentsPhase` silent monitor→static fallback.**
When the source monitor failed to resolve (e.g. `includeMonitors ===
false`), the component was silently degraded to `type = "static"`
and reported as `created` with no explanation. Other phase writers
populate `resource.error` on any degrade path. Added a matching
error string pointing at the source monitor id (or lack thereof) so
the summary conveys the degrade.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(services/page-component): revert discriminated union to flat + .refine
The previous commit switched `componentInput` in the service schema to
a `z.discriminatedUnion("type", [...])` to get a clean `ZodError` at
parse time for the monitor/static invariant. That produced a narrowed
TS shape (`type: "monitor"` → required `monitorId: number`) that every
caller had to match — including the dashboard form, where
react-hook-form can't model discriminated unions cleanly and would
have needed a flat→union adapter at submit time. The ripple was
user-visible frontend churn for a schema-layer concern.
Switched back to a flat `z.object` + cross-field `.refine` on
`(type, monitorId)`. Same parse-time rejection Cubic asked for
(ZodError with a specific path, not an opaque SQLite CHECK failure),
but the inferred TS type stays flat so callers — router input, RHF
form values — keep their existing shape.
Also restored the downstream `&& c.monitorId` guards and
`as number[]` casts in `update-order.ts`. With a flat schema, TS
still sees `monitorId: number | null | undefined` on the monitor
branch; the refine rejects violating input at parse time, but the
guard is needed to narrow for the type system. Matches the
pre-migration shape exactly.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(services/import): reconcile component links on idempotent skip
The idempotency checks in `writeIncidentsPhase` and
`writeMaintenancesPhase` added in the previous pass correctly avoid
duplicate status-report / maintenance rows on rerun, but `continue`-d
out of the writer before the component-link insertion block. The
failure mode this leaves open:
1. Run 1: component phase uses per-resource catch, so a single
component can fail and leave `componentIdMap` partial.
2. The report/maintenance is written with a subset of the intended
links — only the entries whose source id resolved in the map.
3. Run 2: the previously-failed component now succeeds and lands in
`componentIdMap`. The report/maintenance idempotency check hits,
`continue` fires, and the still-missing link is never written.
Both join tables (`statusReportsToPageComponents`,
`maintenancesToPageComponents`) have a composite primary key on
`(parentId, pageComponentId)`. Running the same link-build pass on
the skip path with `.onConflictDoNothing()` is a no-op for the links
already present and adds any that resolved this time round. Matches
the "reruns converge to correct state" model that motivated the
idempotency checks in the first place.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(services/notification): address Claude review pass
Three small findings from the latest Claude review, bundled:
**#1 — `list.ts` redundant `conditions` array.** `listNotifications`
initialised a one-element `SQL[]` and spread it into `and(...)` with
no second push site anywhere. Collapsed to a direct `eq(...)`. The
pattern is load-bearing in `monitor/list.ts` (two conditions: workspace
scope + soft-delete filter) but notifications have no soft-delete
column, so the indirection was pure noise.
**#2 — router `dataInputSchema` duplicated the service schema.**
`packages/api/src/router/notification.ts` hand-rolled a
`z.partialRecord` structurally identical to the `dataSchema` inside
`packages/services/src/notification/schemas.ts`. Drift hazard: a
future provider-value shape change in the service would be accepted
at the tRPC layer and fail with an opaque error deeper in
`validateNotificationData`. Renamed the service schema to
`NotificationDataInputSchema`, exported it from the service barrel,
and replaced the router's local copy + the now-unused
`servicesNotificationProvider` alias.
**#5 — `update.ts` audit missing `provider` metadata.**
`createNotification` attaches `metadata: { provider: input.provider
}` to its audit row; `updateNotification` didn't. The `provider`
column is recoverable from the `before`/`after` rows, but asymmetric
metadata breaks simple `action + metadata.provider` audit queries.
Added `metadata: { provider: existing.provider }` for parity.
Skipped the two non-fixes: the `enrichNotificationsBatch` drizzle-cast
fragility is the same pattern as `monitor/list.ts`, worth a codebase-
wide change rather than a single-domain carve-out; the `dataSchema`
being "intentionally wide" is already called out in the schema JSDoc
and is correct by design (provider/payload alignment is enforced at
the service boundary by `validateNotificationData`).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* test(services/notification): cover update audit + post-downgrade gate
Claude review noted two gaps in the `updateNotification` suite. Adding
both:
**`notification.update` audit row.** `createNotification` already
asserts an audit row fires with `expectAuditRow`; the update path was
silent. With the per-mutation audit contract (every write emits a
row), the update case needs an equivalent pin so a regression that
drops the emit site is caught. Note: the v1 audit buffer shape
(`AuditLogRecord`) doesn't carry `metadata`, so the `{ provider }`
payload can't be asserted directly here — that coverage lands with
the v2 audit-table move, called out in the test comment.
**Plan-gate after a downgrade.** The Cubic-flagged fix added
`assertProviderAllowed(existing.provider)` to `updateNotification`
so a previously-allowed channel becomes read-only once the workspace
drops to a plan that no longer includes it. The regression test
simulates the downgrade by directly inserting a `pagerduty` row into
the free workspace (bypasses the create-time gate) and then calls
`updateNotification` via `freeCtx` — asserts `LimitExceededError`.
Without the gate the update would silently succeed and leave a
channel on an unsupported plan.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(services): address Cubic P1/P2/P3 pass across the stack
Twelve findings from Cubic's 2026-04-24 review on #2106. Fixes are
bundled on this branch because the later-PR content has cascaded
down via the 2107 squash-merge — each item is called out by file so
the commit is easy to skim.
**P1 — `import/phase-writers.ts` subscribers reconciliation.**
`writeIncidentsPhase` and `writeMaintenancesPhase` already
reconcile component links on the idempotent-skip path.
`writeSubscribersPhase` didn't, so `pageSubscriberToPageComponent`
rows lost to a partial first run are never recovered. Same
`.onConflictDoNothing()` pattern, same composite-PK rationale.
**P1 — `page/schemas.ts:129` UpdatePageAppearance accepts any theme.**
`configuration.theme` was `z.string()`; the read path parses
configuration through `pageConfigurationSchema` which uses the
`THEME_KEYS` enum. Wrote-and-broke-read pairs are now impossible.
Router now reuses `UpdatePageAppearanceInput` from services so the
tRPC boundary enforces the same enum.
**P1 — `page/schemas.ts:160` UpdatePageConfiguration too permissive.**
Replaced the `z.record(z.string(), z.string()|z.boolean())` with
`pageConfigurationSchema.nullish()`, which is what the read path
already uses. Dashboard form's free-typed `configuration` values are
cast at the mutation call-site (tRPC input parse catches invalid
submits).
**P2 — `page-component/internal.ts:39` validateMonitorIds.**
Added `isNull(monitor.deletedAt)` so tombstoned monitor ids don't
pass as valid attach targets.
**P2 — `page-component/list.ts:71` enrichment returns deleted monitors.**
Same `isNull(monitor.deletedAt)` filter on the enrichment lookup so
`component.monitor` is never populated with a tombstoned row.
**P2 — `page-component/update-order.ts:277` stale ID set for statics.**
`existingComponentIds` was built from the pre-delete snapshot of
`existingComponents`. If an input static carried an id that matched
a monitor component just removed (because its `monitorId` dropped
out of the input set), the `has(id)` check sent the row to the
UPDATE branch, which silently no-op'd against the now-deleted id
and lost the new static. New set filters by `type === "static"` and
subtracts `removedComponentIds` so only surviving static ids take
the update path.
**P2 — `invitation/accept.ts:75` re-assert invitation expiry.**
Conditional UPDATE only re-checked `acceptedAt IS NULL`. An
invitation expiring between the initial read and the update could
still be claimed. Added `gte(expiresAt, now)` to the UPDATE predicate.
**P2 — `invitation/accept.ts:82` duplicate workspace membership.**
`.onConflictDoNothing()` on `usersToWorkspaces` insert keyed on the
`(userId, workspaceId)` composite PK. An already-member invitee
previously blew up the unique constraint *after…
* feat(services): migrate incident domain onto service layer
Third domain migration, stacked on maintenance. tRPC-only — no Connect
handler, no Slack path — so this PR mostly exercises the
`ServiceContext` actor variants without surface-specific adapters.
## Services (`packages/services/src/incident/`)
- `acknowledgeIncident`, `resolveIncident`, `deleteIncident`,
`listIncidents`, `getIncident`.
- Both `acknowledge` / `resolve` stamp the acting user's id onto
`acknowledged_by` / `resolved_by` via the new `tryGetActorUserId`
helper on `ServiceContext`. Non-user actors (system, webhook, API keys
without a linked userId) stamp `null`.
- Idempotent `ConflictError` when an incident is already acknowledged
or resolved — matches existing tRPC's `BAD_REQUEST` semantics.
- All mutations wrapped in `withTransaction`, emit `emitAudit`.
## list.ts
- Batch-enriches monitors via a single IN query against distinct
`monitorId`s — pairs well with the 10_000 sentinel tRPC passes.
Avoids the N+1 Cubic flagged on the maintenance PR.
## Surfaces
- **tRPC** (`packages/api/src/router/incident.ts`): all four procedures
call services. `delete` catches `NotFoundError` to preserve the
pre-migration idempotent behaviour. `list` narrows `monitor` to
non-null in the return type — every incident is expected to have an
associated monitor via the FK. `acknowledge` / `resolve` still return
`true` to match the old contract.
## Context helper
- `tryGetActorUserId(actor)` in `packages/services/src/context.ts` —
returns the openstatus user id for user/apiKey/slack actors when
available, `null` for system/webhook. Exported from the root barrel.
## Enforcement
- Biome `noRestrictedImports` scope adds
`packages/api/src/router/incident.ts`.
- Subpath export `@openstatus/services/incident`.
## Tests
- `packages/services/src/incident/__tests__/incident.test.ts` — happy
paths for acknowledge / resolve / delete, already-acknowledged and
already-resolved `ConflictError`, workspace isolation across all
mutations, list workspace isolation, and the batch monitor enrichment.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(services/incident): address Cubic review
- **TOCTOU on acknowledge / resolve** — the old read-then-update allowed
two concurrent acknowledgers (or resolvers) to both succeed. Replaced
with a conditional update (`WHERE acknowledged_at IS NULL` /
`WHERE resolved_at IS NULL`); the loser of the race gets no row back
and we throw the same `ConflictError` the pre-read would have raised.
Dropped the now-unreachable `InternalServiceError` branch.
- **Monitor enrichment workspace scope** — `enrichIncidentsBatch` now
filters `monitor.workspaceId = ctx.workspace.id` alongside the
`inArray` on monitor id. Defence-in-depth: the `incident.monitorId`
column has no FK against workspace ownership, so a cross-workspace
pointer (however unlikely) no longer leaks the other workspace's
monitor row.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(services): migrate monitor domain tRPC (PR 4/N) (#2105)
* feat(services): migrate monitor domain tRPC onto service layer
Fourth domain, the largest so far. Migrates the **tRPC** monitor router
(all 14 procedures) onto `@openstatus/services/monitor`. The Connect
handler (`apps/server/src/routes/rpc/services/monitor/`) and the v1 REST
`apps/server/src/routes/v1/monitors/*` endpoints stay untouched for now
— they are separate external-API surfaces and warrant their own
follow-up PRs with their own test suites.
- `createMonitor`, `cloneMonitor`, `deleteMonitor`, `deleteMonitors`
(bulk soft-delete), `getMonitor`, `listMonitors`.
- `updateMonitorGeneral` — preserves the existing tRPC `updateGeneral`
behaviour including jobType switching (HTTP ↔ TCP ↔ DNS). Called out
as a code smell in the explore, but preserved intentionally since
it's the dashboard's current edit flow. Kept jobType-agnostic rather
than split into 3 separate update methods.
- `updateMonitorRetry`, `updateMonitorFollowRedirects`,
`updateMonitorOtel`, `updateMonitorPublic`,
`updateMonitorResponseTime` — field-specific setters.
- `updateMonitorSchedulingRegions` — enforces plan limits
(periodicity / region-count / region-access) + validates the
private-location ids before replacing the association set.
- `updateMonitorTags`, `updateMonitorNotifiers` — validate and
replace the tag / notifier association sets.
- `bulkUpdateMonitors` — batched toggle of `public` / `active`
across multiple monitor ids. Matches the old `updateMonitors`
procedure.
- All mutations run inside `withTransaction`, emit `emitAudit`.
- Cascade deletes: `monitor_tag_to_monitor`,
`notifications_to_monitors`, `page_component` rows are torn down on
delete (matches pre-migration behaviour — bypasses FK cascades
because some rows reference the monitor without cascade).
- `validateTagIds` / `validateNotificationIds` / `validatePrivateLocationIds`
— workspace-scoped validators with dedupe.
- `pickDefaultRegions(workspace)` — ports the old "randomly pick 4
free / 6 paid regions excluding deprecated" logic.
- `serialiseAssertions` / `headersToDbJson` — assertion / header
serialisation moved out of the tRPC router.
- `countMonitorsInWorkspace` for quota checks.
`listMonitors` does two IN queries (tags + incidents) regardless of
list size. `getMonitor` reuses the same path with the richer
`{ notifications, privateLocations }` toggle and a singleton. Same
pattern as status-report's batched fix.
- **tRPC** (`packages/api/src/router/monitor.ts`): every procedure is a
thin wrapper. `delete` catches `NotFoundError` for idempotency.
`new` / `updateGeneral` keep the `testHttp` / `testTcp` / `testDns`
pre-save check at the tRPC layer — services unconditionally save,
callers decide whether to pre-validate.
- Connect RPC handler at `apps/server/src/routes/rpc/services/monitor/`
— still uses its own helpers. Will be migrated in a follow-up
(4b).
- v1 REST endpoints at `apps/server/src/routes/v1/monitors/*` — yet
another external surface; dedicated PR later.
- `packages/api/src/service/import.ts` monitor writes — covered by
the plan's dedicated PR 9.
- Biome `noRestrictedImports` scope adds
`packages/api/src/router/monitor.ts`. The Connect monitor handler
stays out of scope until 4b.
- Subpath export `@openstatus/services/monitor`.
- `packages/services/src/monitor/__tests__/monitor.test.ts` —
create (http / tcp / dns), delete cascade, bulk delete, clone,
tags / notifiers validation (forbidden when cross-workspace),
list / get with workspace isolation + soft-delete hiding,
updateMonitorGeneral round-trip.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(services): declare @openstatus/regions + @openstatus/assertions deps
The monitor domain pulls `regionDict` from `@openstatus/regions` (for
the default-region picker) and assertion classes / validators from
`@openstatus/assertions`. Both were missing from
`packages/services/package.json`, so Vercel's Next.js build for
`apps/status-page` — which transitively imports the services package
via `@openstatus/api` — failed with "Module not found" on
`packages/services/src/monitor/internal.ts:20` and `schemas.ts:1`.
Local pnpm workspaces resolved the imports via hoisting, which masked
the missing declarations until an actual Next.js build forced strict
resolution.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(services/monitor): address Cubic review
- **`timeout` / `degradedAfter` bounds** (schemas.ts) — mirror the
0–60_000 ms cap from `insertMonitorSchema`. Values outside the range
are rejected before the UPDATE instead of being silently persisted.
- **`jsonBody` assertion mapping** (internal.ts) — the serialiser was
silently dropping `jsonBody`-typed input because the runtime class
wasn't wired up. Added the `JsonBodyAssertion` branch; the class has
existed in `@openstatus/assertions` the whole time, this was just a
missing case.
- **Clone resets `status`** (clone.ts) — cloning no longer inherits the
source's current `error`/`degraded` health. Freshly cloned monitors
start at `"active"` and settle on their first check.
- **Delete filters soft-deleted** (delete.ts) — the pre-check now
includes `isNull(monitor.deletedAt)`, so a repeat delete returns
`NotFoundError` (preserved as idempotent at the tRPC layer) instead
of re-running the cascades and emitting duplicate audits.
- **List/get workspace scope on relations** (list.ts) — `enrichMonitorsBatch`
takes `workspaceId` and scopes the incident / tag / notification /
private-location IN queries to the caller's workspace. Defence-in-depth
against inconsistent FK data (none of those tables enforce workspace
ownership at the FK level).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(services): migrate notification CRUD (PR 5/N) (#2106)
* feat(services): migrate notification CRUD onto service layer
Fifth domain. Migrates the four CRUD procedures — `list`, `new`,
`updateNotifier`, `delete` — of the notification tRPC router onto
`@openstatus/services/notification`. The three integration-helper
procedures (`sendTest`, `createTelegramToken`, `getTelegramUpdates`)
stay inline at the tRPC layer and are explicitly out of scope for this
migration.
- `createNotification` — enforces plan limits on both the channel count
(`notification-channels`) and the provider itself (`sms` / `pagerduty`
/ `opsgenie` / `grafana-oncall` / `whatsapp` require plan flags);
validates the loose `data` payload against `NotificationDataSchema`;
validates monitor ids are in-workspace and not soft-deleted.
- `updateNotification` — replaces name / data / monitor associations in
a single transaction with the same validation rules as create.
- `deleteNotification` — hard-delete (FK cascade clears associations).
The tRPC wrapper swallows `NotFoundError` to preserve the old
idempotent behaviour.
- `listNotifications`, `getNotification` — batched IN query enriches
monitors per notification. Monitor enrichment is workspace-scoped and
filters soft-deleted monitors for defence-in-depth.
- All mutations run inside `withTransaction`, emit `emitAudit`.
- **tRPC** (`packages/api/src/router/notification.ts`): `list` / `new` /
`updateNotifier` / `delete` become thin service wrappers. `sendTest`
+ `createTelegramToken` + `getTelegramUpdates` are unchanged.
- **`sendTest` migration** — the dispatch switch imports from 10
`@openstatus/notification-*` packages. Moving it into services would
pull those as direct deps; the plan's phrasing ("Channel CRUD +
test-dispatch") allows this as a later extraction.
- **`createTelegramToken` / `getTelegramUpdates`** — redis + external
Telegram API helpers; transport UX, not domain operations.
- **Biome scope for `notification.ts`** — the file still imports
`@openstatus/db/src/schema` for the `sendTest` provider data schemas.
Will land with the sendTest migration follow-up.
- **Connect RPC notification handler** — stays on its own helpers;
follow-up aligned with PR 4's Connect deferral.
- `__tests__/notification.test.ts` covers create (including
`ValidationError` on malformed data, `LimitExceededError` on gated
provider, `ForbiddenError` on cross-workspace monitor), update
(association replacement, cross-workspace `NotFoundError`), delete,
list/get workspace isolation + monitor enrichment scope.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(services/notification): address Cubic review
- **Update flow plan gate** (update.ts) — `updateNotification` was
skipping `assertProviderAllowed`, so a user who downgraded their plan
could still edit a notification configured with a now-restricted
provider. Re-check against the stored `existing.provider` to match
the create-time gate.
- **Provider / data match** (internal.ts) — `NotificationDataSchema` is
a union, so `{ provider: "discord", data: { slack: "…" } }` passed
the union check even though the payload key doesn't match the
provider. `validateNotificationData` now takes the provider and
asserts `provider in data` after the top-level parse. Applied in
both `create` and `update` (update uses the stored provider since
the API doesn't allow provider changes).
Added a test for the mismatch case.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(services/notification): tighten data validation against provider schema
Cubic's follow-up on the previous fix was right: checking only
`provider in data` isn't enough. `NotificationDataSchema` is a union,
so a payload like `{ discord: "not-a-url", slack: "valid-url" }` passes
because the union matches the slack variant — the extra `discord` key
is ignored, and my key-presence check sees `"discord"` and lets it
through.
Replaced the union parse + key check with a provider-specific schema
lookup (`providerDataSchemas[provider].safeParse(data)`). Each
canonical channel schema is keyed by its provider name and validates
the shape / content of the value, so the new check catches both the
mismatched-provider and malformed-payload cases in one pass.
Added a test covering the exact case Cubic flagged — invalid `discord`
URL alongside a valid `slack` URL now rejects with ValidationError.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(services): migrate page-component domain (PR 6/N) (#2107)
* feat(services): migrate page-component domain onto service layer
Sixth domain, tRPC-only. Migrates `list`, `delete`, and `updateOrder`
onto `@openstatus/services/page-component`.
- `listPageComponents` — workspace-scoped filter + optional pageId
filter. Batched enrichment in four IN queries (monitors, groups,
status reports via join, maintenances via join). All relation queries
scoped to the caller's workspace for defence-in-depth.
- `deletePageComponent` — hard-delete. Cascade clears the
`status_report_to_page_component` / `maintenance_to_page_component`
associations. The tRPC wrapper swallows `NotFoundError` to preserve
the pre-migration idempotent behaviour.
- `updatePageComponentOrder` — the complex one. Mirrors the existing
diff-and-reconcile pass faithfully (≈220 lines → a single transaction):
1. Assert the page is in the workspace.
2. Enforce the workspace's `page-components` plan cap.
3. Validate every monitor id in the input set.
4. Remove monitor components whose monitorId isn't in the input;
remove static components based on whether the input carries ids.
5. Clear `groupId` before dropping groups (FK safety), then recreate
groups.
6. Upsert monitor components via `onConflictDoUpdate` on the
`(pageId, monitorId)` unique constraint (preserves ids).
7. Update existing static components by id; insert new ones.
Audit: `page_component.update_order` / `page_component.delete`.
- **tRPC** (`packages/api/src/router/pageComponent.ts`): all three
procedures call services. `delete` catches `NotFoundError` and
returns the old `drizzle.returning()`-shaped empty array. The
pre-existing `pageComponent.test.ts` (tests cross-workspace monitorId
→ `TRPCError(FORBIDDEN)`) is untouched and still valid — my services
throw `ForbiddenError`, which `toTRPCError` maps to the same code.
- Biome `noRestrictedImports` scope adds
`packages/api/src/router/pageComponent.ts`.
- Subpath export `@openstatus/services/page-component`.
- `__tests__/page-component.test.ts` covers `updatePageComponentOrder`
happy path (creates monitor + static + grouped components), rejects
cross-workspace monitorId and cross-workspace pageId, `list`
workspace isolation, `delete` cross-workspace `NotFoundError`.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(services): Connect RPC notification handler catch-up (#2108)
* feat(services): Connect RPC notification handler onto services (catch-up)
Follow-up to PR 5 — noticed on review that my PRs from PR 4 onwards had
been narrowing scope to tRPC only and deferring Connect handlers, which
was piling up. This closes the notification Connect gap.
`apps/server/src/routes/rpc/services/notification/index.ts` — the five
CRUD methods now delegate to `@openstatus/services/notification`:
- `createNotification` → `createNotification` service (handles the
plan-count limit, per-provider plan gate, and data-schema validation
internally — the Connect-side `checkNotificationLimit` /
`checkProviderAllowed` / `validateProviderDataConsistency` calls are
gone).
- `getNotification`, `listNotifications`, `updateNotification`,
`deleteNotification` — thin proto-to-service-to-proto wrappers.
- `updateNotification` reads the existing record via the service and
fills in missing fields (Connect's update is partial; the service
expects a full payload), then applies the update.
Left inline:
- `sendTestNotification` — calls `test-providers.ts` (external HTTP).
- `checkNotificationLimit` RPC method — returns the count info via
`./limits.ts` helpers (pure queries, no domain mutation).
The local Connect helpers (`validateProviderDataConsistency`,
`checkNotificationLimit`, `checkProviderAllowed`, and the ad-hoc
`validateMonitorIds` / `updateMonitorAssociations` / `getMonitorById` /
`getMonitorCountForNotification` / `getMonitorIdsForNotification`) are
no longer imported by `index.ts`; they remain in their files because
`test-providers.ts` and the unmigrated Connect monitor handler still
reference some of them.
Added `apps/server/src/routes/rpc/services/notification/index.ts` to
the `noRestrictedImports` scope. The directory-level glob isn't a fit
because `limits.ts` and `test-providers.ts` legitimately need direct
db access until their own follow-up migrations.
- **Connect monitor handler** (~880 lines, 6 jobType-specific
create/update methods + 3 external-integration methods) — requires a
much bigger refactor. Flagged as dedicated PR 4b; tracked separately.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(services): dedupe monitor ids in Connect createNotification response
Cubic's P2 catch: the service dedupes `monitors` before the insert
(via `validateMonitorIds` in the services package), but the Connect
handler echoed `req.monitorIds` verbatim back in the response. For an
input like `["1", "1", "2"]` the DB stored `[1, 2]` while the response
claimed `["1", "1", "2"]` — caller state diverges from persistence.
Echo `Array.from(new Set(req.monitorIds))` instead so the response
matches what's actually stored.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(services): migrate page authoring (PR 7/N) (#2109)
* feat(services): migrate page (status-page authoring) onto service layer
Seventh domain. Migrates the 13 authoring procedures in
`pageRouter` onto `@openstatus/services/page`. Deliberately scoped to
authoring CRUD only:
- `statusPage.ts` — public viewer endpoints (subscribe / get / uptime /
report / verify / unsubscribe) are a separate surface that doesn't
use the authenticated `ServiceContext`; dedicated follow-up.
- Connect `apps/server/src/routes/rpc/services/status-page/**` — ~1500
lines with 18 methods (page CRUD + components + groups + subscribers
+ view). Too big for this PR; dedicated follow-up, same shape as the
Connect monitor deferral.
- `createPage` / `newPage` — full vs minimal create; both enforce the
`status-pages` plan cap and (for `createPage`) the per-access-type
plan gates (password-protection, email-domain-protection, ip-
restriction, no-index).
- `deletePage` — FK cascade clears components / groups / reports /
subscribers.
- `listPages` — batched enrichment with `statusReports`.
- `getPage` — enriched with `maintenances` / `pageComponents` /
`pageComponentGroups`.
- `getSlugAvailable` — pure check against `subdomainSafeList` + DB.
- `updatePageGeneral` — with slug-uniqueness re-check on change.
- `updatePageCustomDomain` — persists the DB change and returns the
previous domain so the caller can diff. Vercel add/remove stays at
the tRPC layer (external integration).
- `updatePagePasswordProtection` — re-applies the same plan gates
the `create` path uses.
- `updatePageAppearance`, `updatePageLinks`, `updatePageLocales`
(gated on `i18n` plan flag), `updatePageConfiguration`.
- Audit action emitted for every mutation.
All 13 procedures are thin wrappers. `delete` catches `NotFoundError`
for idempotency. `updateCustomDomain` orchestrates:
1. `getPage` (via service) to read the existing domain.
2. `addDomainToVercel` / `removeDomainFromVercel` as needed.
3. `updatePageCustomDomain` (via service) to persist.
- Biome scope adds `packages/api/src/router/page.ts`. The router
imports `insertPageSchema` via the services re-export
(`CreatePageInput`) so the db-import ban applies cleanly.
- Subpath export `@openstatus/services/page`.
- `__tests__/page.test.ts` covers `newPage` happy / reserved /
duplicate, `createPage` monitor attachment + cross-workspace monitor,
`updatePageGeneral` rename + duplicate-slug conflict + cross-workspace,
`updatePageLocales` plan gate, list / get / slug-available workspace
isolation, delete cross-workspace NotFoundError.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(services): migrate Connect status-page page CRUD onto services
Extends PR #2109 to cover the Connect RPC status-page handler's page
CRUD surface (create / get / list / update / delete), matching the
migration that landed for tRPC's `pageRouter`. The other 13 methods
(components, groups, subscribers, viewer) still read the db directly —
they're separate domains that'll need their own services in follow-ups.
- create / get / delete call into `@openstatus/services/page` and
preserve the granular Connect errors (`statusPageNotFoundError`,
`slugAlreadyExistsError`) by pre-checking before the service call or
catching `NotFoundError` → re-throwing the richer variant.
- list fetches via the service and paginates in-memory; status-page
quota is bounded per workspace so the extra enrichment is negligible.
- update loads the existing page via the service, then orchestrates the
per-section updates (`updatePageGeneral`, `updatePageLinks`,
`updatePageAppearance`, `updatePageCustomDomain`, `updatePageLocales`,
`updatePagePasswordProtection`) inside a shared transaction so a
partial failure can't leave the page half-updated. Each service's
internal `withTransaction` detects the pre-opened tx and skips
nesting.
- Proto-specific format validations (https icon URL, custom-domain
regex, IPv4 CIDR, email-domain shape) and the i18n PermissionDenied
path stay at the handler — they don't exist in the zod insert schema
and their error codes would change if deferred to the service.
- `Page` from the service parses `authEmailDomains` / `allowedIpRanges`
into arrays, while the converters (still used by the unmigrated
methods) expect the comma-joined string form. `serviceToConverterPage`
bridges the two shapes at the call sites that need it.
Biome scope deliberately unchanged: the file still imports from
`@openstatus/db` for the 13 legacy methods, so the override would
light up the whole file.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(services/page): address Cubic review on #2109
Four issues flagged across two Cubic reviews:
- `createPage` skipped `assertSlugAvailable`, so full-form creates
could bypass reserved/duplicate slug validation and either create a
duplicate or fail late on a DB constraint instead of the clean
`ConflictError`. Added the check alongside the existing quota gate.
- `createPage` passed `passwordProtected` / `allowedIpRanges` but not
`allowIndex` to `assertAccessTypeAllowed`, bypassing the `no-index`
plan gate on create. Now forwarded.
- `UpdatePagePasswordProtectionInput.allowedIpRanges` accepted arbitrary
strings. Mirrored the CIDR validation from `insertPageSchema` — bare
IPs get `/32` appended, everything pipes through `z.cidrv4()`.
- `updatePagePasswordProtection` wrote `authEmailDomains:
input.authEmailDomains?.join(",")`, which evaluates to `undefined`
when the caller clears the field. Drizzle treats `undefined` as
"skip this column" on `.set()`, so stale email domains survived an
access-type switch. Added the `?? null` fallback to match the
neighboring `allowedIpRanges` line. This fixes the Connect
`updateStatusPage` path where switching away from AUTHENTICATED sets
`nextAuthEmailDomains = undefined` expecting the column to clear.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(services): migrate workspace / user / invitation / api-key (PR 8/N) (#2110)
* feat(services): migrate workspace / user / invitation / api-key (PR 8/N)
Stacked on PR #2109. Eighth migration — four small domains consolidated into
one PR because each is narrow (roughly two to five procedures) and they
share no structural dependencies beyond already-migrated infrastructure.
**workspace** — `getWorkspace`, `getWorkspaceWithUsage` (pages + monitors +
notifications + page-components batched via drizzle relations),
`listWorkspaces` (takes `userId` explicitly since `list` runs across every
workspace the user has access to), `updateWorkspaceName`.
**user** — `getUser` (active, non-soft-deleted), `deleteAccount` (the paid-
plan guardrail stays; removes non-owned memberships, sessions, OAuth
accounts and blanks the PII columns inside a single tx).
**invitation** — `createInvitation` (plan gate counts pending invites
against the members cap so two outstanding invites can't both accept past
the limit), `deleteInvitation`, `listInvitations`, `getInvitationByToken`
(scoped by token **and** accepting email to prevent token-sharing),
`acceptInvitation` (stamps acceptedAt + inserts membership atomically).
**api-key** — `createApiKey` (returns plaintext token once), `revokeApiKey`
(workspace-scoped existence check inside the tx so concurrent revokes
resolve to a consistent NotFound rather than a silent no-op),
`listApiKeys` (replaces the legacy per-row `Promise.all` fan-out with a
single IN query for creator enrichment), `verifyApiKey` +
`updateApiKeyLastUsed` (no ctx required — the verify path runs before
workspace resolution and callers pass an optional `db` override).
All 14 procedures become thin `try { return await serviceFn(...) } catch
{ toTRPCError }` wrappers. Router shapes stay identical so the dashboard
needs no changes. Connect + Slack don't expose these domains today;
migrating their consumers is a follow-up.
Biome `noRestrictedImports` override adds the four router files. Subpath
exports `@openstatus/services/{workspace,user,invitation,api-key}` added
to the services package.
Deletes `packages/api/src/service/apiKey.ts` and its tests — fully
superseded by `packages/services/src/api-key/`. The auth middleware in
`apps/server` has its own inline apiKey verification and is unaffected.
- **`domain.ts`** — pure Vercel-API proxy with no DB usage; not part of
the migration surface. Stays as-is.
- **`packages/api/src/service/{import,telegram-updates}.ts`** — import
migration is PR 9; telegram-updates stays for a follow-up.
Per-domain `__tests__/*.test.ts` covers: workspace rename + audit, usage
counts, members cap hit on free plan, invitation token-mismatch rejection,
accept idempotency, api-key creation returning a bcrypt hash, list creator
enrichment, revoke NotFoundError on unknown ids, verifyApiKey happy / bad-
format / wrong-body paths, lastUsed debounce.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(services): address Cubic review on #2110
Four issues flagged on PR 8:
- **P1 — `invitation/accept.ts`**: the read-then-write pattern let two
concurrent accepts both pass the `isNull(acceptedAt)` check and race
through the membership insert. Replaced with a conditional UPDATE that
re-asserts `isNull(acceptedAt)` in the WHERE clause and checks
`.returning()` rowcount. The loser gets `ConflictError`, the tx aborts
before membership inserts run.
- **P2 — `api-key/create.ts`**: `createdById` was taken from input and
the router spliced in `ctx.user.id`. Since that column is attribution
data (who owns the key, who the audit row blames), trusting input
would let any caller forge ownership. Derived from `ctx.actor` via
`tryGetActorUserId`; actors without a resolvable user id (system /
webhook / unlinked api-key) now get `UnauthorizedError` instead of a
silent NULL write. `createdById` removed from the input schema.
- **P2 — `invitation/delete.ts`**: audit row was emitted even when the
DELETE matched zero rows (unknown id / wrong workspace). Switched to
`.returning({ id })` and short-circuit before the audit emit so the
log only reflects actual deletions.
- **P2 — `invitation/list.ts`**: the `if (!input.email)` →
`UnauthorizedError` branch in `getInvitationByToken` was unreachable
because `z.email()` already rejects empty / malformed emails at
`.parse()`. Removed the dead branch; the router keeps its own
pre-call check for `ctx.user.email`, so the transport-level
UnauthorizedError path is preserved.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(services): migrate import domain (PR 9/N) (#2111)
* feat(services): migrate import domain (PR 9/N)
Stacked on PR #2110. Ninth and final domain — lifts the ~1,000-line
`packages/api/src/service/import.ts` orchestrator into the services
package as its own `@openstatus/services/import` domain.
Split into focused files:
- **`schemas.ts`** — `PreviewImportInput` / `RunImportInput` zod. Provider
discriminator + per-provider page-id fields live here; options schema
is separately exported for callers that want to pre-validate.
- **`provider.ts`** — `createProvider` factory + `buildProviderConfig`
reshape helper, isolated from the orchestrator so adding a provider is
a one-file change.
- **`limits.ts`** — `addLimitWarnings` (shared by preview + run). Pure
mutation on the `ImportSummary` argument; no writes.
- **`utils.ts`** — `clampPeriodicity` + `computePhaseStatus` helpers.
- **`phase-writers.ts`** — the seven phase writers (page / component
groups / components / incidents / maintenances / monitors /
subscribers). Each takes a `DB` explicitly so callers can thread a
pre-opened tx; failing resources get `status: "failed"` with an error
string rather than throwing.
- **`preview.ts`** — dry-run only; validates credentials, runs the
provider with `dryRun: true`, emits warnings.
- **`run.ts`** — the orchestrator. Now owns the `pageId` ownership
check (previously duplicated in the tRPC router) and emits exactly
**one** `import.run` audit row regardless of outcome so partial /
failed runs still show up in the audit signal. Deliberately *not*
wrapped in `withTransaction` — imports can span minutes across dozens
of writes and the existing UX is phase-level recovery.
124 lines → 28 lines. The router is now a thin `previewImport` /
`runImport` wrapper; the input schemas and all validation live in the
service. The router-level `TRPCError`-throwing `pageId` ownership check
moved into `runImport` so non-tRPC callers (Slack / future) get the
same guard.
- Provider validation failure: `TRPCError("BAD_REQUEST")` →
`ValidationError` → `TRPCError("BAD_REQUEST")`. Net-same.
- Unknown / wrong-workspace `pageId`: `TRPCError("NOT_FOUND")` →
`NotFoundError` → `TRPCError("NOT_FOUND")`. Net-same.
- Unit tests for `addLimitWarnings` / `clampPeriodicity` /
`computePhaseStatus` move to `packages/services/src/import/__tests__/`.
- Router integration tests (`packages/api/src/router/import.test.ts`)
that previously called `previewImport` / `runImport` directly to
override workspace limits now route through `makeCaller(limitsOverride)`
with an explicit `provider: "statuspage"` field. This also fixes four
pre-existing TypeScript errors where those calls were missing the
(required) provider discriminator.
- Biome `noRestrictedImports` override adds `packages/api/src/router/import.ts`.
- Subpath export `@openstatus/services/import` added.
- `@openstatus/importers` added to services deps; services `tsconfig.json`
bumped to `moduleResolution: "bundler"` so the importers package-exports
map resolves (same setting `packages/api` already uses).
Deletes `packages/api/src/service/import.ts` (1042 lines) and its test
file (463 lines). Only `telegram-updates.ts` remains in
`packages/api/src/service/` — that's slated for a follow-up PR.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(services/import): per-resource audit + Cubic fixes on #2111
Two changes folded together:
Every phase writer now emits one `emitAudit` row per *created*
resource, matching what the domain services emit for normal CRUD:
| Phase | Audit action
| --- | ---
| page | `page.create`
| componentGroups | `page_component_group.create`
| components | `page_component.create`
| monitors | `monitor.create`
| incidents | `status_report.create` + `status_report.add_update` per update
| maintenances | `maintenance.create`
| subscribers | `page_subscriber.create`
Skipped resources don't emit (their original create audit already
exists); failed resources don't emit (nothing was written); link-table
rows (statusReportsToPageComponents etc.) don't emit (edges, not
entities). Metadata always carries `source: "import"` + `provider:
<name>` + `sourceId: <provider-id>` so the audit trail traces back to
the source system.
The rollup `import.run` audit still fires at the end — the per-resource
rows give forensic granularity, the run-level row gives "this bulk
operation happened" without scanning the full summary blob.
For the change, phase writers now take a shared `PhaseContext = { ctx,
tx, provider }` instead of `(db, workspaceId, limits)` — the orchestrator
builds one `PhaseContext` per run and threads it through, giving each
writer access to `ctx.actor` for audit attribution. `statusReportUpdate`
writes now use `.returning({ id })` so the per-update audit can
attribute the right row.
- **`run.ts:130`** — phases after `page` kept their provider-assigned
status when `targetPageId` was falsy but the user option wasn't
`false`. Replaced the narrow `else if (option === false)` branches
with a plain `else → phase.status = "skipped"`, matching what
`subscribers` already did.
- **`run.ts:147`** — when the `components` phase hit `remaining <= 0`,
the phase was marked `"failed"` but individual resource statuses were
left stale with no error string. Each resource is now marked
`"skipped"` with `"Skipped: component limit reached (N)"`, matching
`writeMonitorsPhase`. Phase-level status becomes `"skipped"` too
(was `"failed"` — failed implied a writer error, this is really a
plan-limit pre-check).
- **`provider.ts`** — both `createProvider` and `buildProviderConfig`
had a `default:` that silently ran the Statuspage adapter for any
unknown provider name, which would mask a typo by handing a non-
Statuspage api key to the wrong adapter. Replaced with exhaustive
`case "statuspage"` + `never`-typed default throw.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* chore(services): rename rpc/services → rpc/handlers (PR 10/N) (#2112)
The symbolic deliverable from the plan's "close the loop" PR. Renames
`apps/server/src/routes/rpc/services/` → `apps/server/src/routes/rpc/handlers/`
so the distinction between "the services layer" (owns business logic,
lives in `packages/services`) and "Connect transport handlers" (thin
proto → service → proto wrappers) is permanent and visible in the path.
Keeping the old name invites the next developer to "just add one small
thing" to a file under a `services/` folder months later; the rename
makes the layering explicit.
- `git mv` of the six domain subdirectories + their tests
(health / maintenance / monitor / notification / status-page /
status-report).
- `router.ts` import paths updated from `./services/*` to `./handlers/*`.
- Biome `overrides.include` paths updated to the new location.
- Added `apps/server/src/routes/rpc/handlers/health/**` to the scope —
the health handler has no db usage today; including it locks in that
invariant.
Rather than pretending the full "close the loop" deliverable is possible
today, the biome.jsonc comment now enumerates exactly what remains
unmigrated:
- `packages/api/src/router/statusPage.ts` — public viewer endpoints
under `publicProcedure`, no authed `ServiceContext`.
- `packages/api/src/router/{member,integration,monitorTag,
pageSubscriber,privateLocation,checker,feedback,stripe,tinybird,
email}.ts` — small domains not yet lifted.
- `apps/server/src/routes/rpc/handlers/monitor/**` — 6 jobType-specific
methods still on db.
- `apps/server/src/routes/rpc/handlers/status-page/**` — page CRUD is
migrated (PR 7), but components / groups / subscribers / viewer (13
methods) still import db, so the whole file stays out of scope.
- `apps/server/src/routes/v1/**` — the public HTTP API surface.
- `apps/server/src/routes/slack/**` except `interactions.ts` — tools,
handler, oauth, workspace-resolver still on db.
- `apps/server/src/routes/public/**` — public-facing HTTP routes.
Each of the above is its own PR-sized migration. The final consolidation
(broadening to `router/**` + dropping `@openstatus/db` from
`packages/api` and `apps/server`) is conditional on all of them
landing first.
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(services/import): use ctx workspaceId for page insert
`writePagePhase` was inserting with `data.workspaceId` — the value the
provider package round-tripped into resource data. Every other phase
writer (monitor / components / subscriber) already reads `workspaceId`
from `ctx.workspace.id`; this lines the page insert up with that
pattern. Defends against the (unlikely) case where a provider mapper
serialises the wrong workspace id into its output, since `ctx` is the
authoritative source.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(services): address Claude review findings on #2110
Six findings from Claude's review pass — five code/doc fixes, one
documentation-only note.
**P2 — `acceptInvitation` derives userId from `ctx.actor`.**
Was taking it from input: the email scoped *which* invitation could
be accepted, but not *who* the membership was inserted for. A caller
with the right token+email could insert a membership under an
arbitrary user id. Removed `userId` from `AcceptInvitationInput`;
derived from `tryGetActorUserId(ctx.actor)`, throws
`UnauthorizedError` for non-user actors. Mirrors the same pattern
applied to `createApiKey.createdById` in the Cubic pass. Router and
test updated accordingly.
**P2 — `getWorkspace` throws `NotFoundError` explicitly.**
`findFirst` + `selectWorkspaceSchema.parse(undefined)` was throwing
`ZodError` (→ `BAD_REQUEST`) instead of the `NotFoundError` shape
every other service uses. Unreachable in practice (ctx.workspace is
resolved upstream) but the error shape was the only outlier;
consistency matters for callers pattern-matching on error codes.
**P3 — `listApiKeys` filters null `createdById` before the IN query.**
The new `createApiKey` path enforces a non-null creator, but legacy
rows may have null. SQL's `x IN (NULL)` is `UNKNOWN` — technically
safe — but drizzle types model the array as `number[]`. Filtering
upfront keeps the types honest and sidesteps any future surprise.
**P3 — `deleteInvitation` guards `acceptedAt IS NULL`.**
The WHERE previously allowed hard-deleting *accepted* invitations,
wiping the "user was invited on X" breadcrumb. Added the
`isNull(acceptedAt)` guard + doc comment explaining the audit-trail
preservation intent.
**Doc-only — `deleteAccount` orphan comment.**
Non-owner memberships are removed, but owner memberships + owned
workspaces survive. Matches legacy behavior. Added a scope-note
docblock flagging that workspace cleanup is explicitly out of scope
(belongs to a future admin / scheduled job).
**Doc-only — `createInvitation` role comment.**
The invite insert lets `role` fall through to the schema default
(`member`). Matches legacy (which also only picked `email`).
Comment added so the absence reads as deliberate rather than
overlooked.
Minor — the concurrent-accept race test is covered by the conditional
UPDATE + `ConflictError` path from the earlier P1 fix; mocking it
reliably against SQLite is noisy and not worth the test complexity.
Documented in the related code comment.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(services): address Claude re-review findings on #2110
Four issues surfaced after the first round of fixes on this PR:
**P2 — `listApiKeys` crashes on all-legacy keys.**
After the null filter added in the previous commit, workspaces whose
keys all pre-date the services migration (every `createdById` null)
end up with `creatorIds === []`. Drizzle throws "At least one value
must be provided" on an empty `inArray`, taking the whole endpoint
down. Added an early return that maps `createdBy: undefined` when
there are no non-null creator ids to look up.
**P2 — `getWorkspaceWithUsage` ZodError on missing row.**
Same `findFirst` + `selectWorkspaceSchema.parse(result)` pattern as
`getWorkspace`, but without the `NotFoundError` guard that got added
in the earlier pass. Added the guard. Also cleaned up the usage
block — no longer needs optional chaining once the narrowing fires.
**P2 — `deleteAccount` took `userId` from input.**
Completing the `createApiKey` / `acceptInvitation` pattern: account
deletion must target `ctx.actor`, never an arbitrary id. Dropped
`userId` from `DeleteAccountInput` (now an empty forward-compat
shape), derived inside the service via `tryGetActorUserId`, throws
`UnauthorizedError` for non-user actors. Router updated to stop
passing it.
**P3 — `createInvitation` dev-token log could leak in tests.**
Tightened the comment around the `process.env.NODE_ENV === "development"`
guard to flag that strict equality is load-bearing — bun:test sets
`NODE_ENV=test` and CI leaves it undefined, both of which correctly
skip the log. No behavior change, just a clearer contract so the
next reader doesn't loosen it.
Cubic's two findings on this review pass point at `packages/api/src/
router/import.ts` and `packages/services/src/import/limits.ts` — both
live in the next PR up the stack (#2111 / feat/services-import) and
will be addressed there.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(services): address latest Cubic pass on #2110
Four findings from the third Cubic review (now that #2111's import
domain is included in the #2110 diff via the stack):
**P2 — biome.jsonc notification handler scope.**
Only `notification/index.ts` was in the `noRestrictedImports` override.
Sibling files (`errors.ts`, `test-providers.ts`) were outside the
migration guard, so new db imports could land in them without the
lint failing. Broadened to `notification/**` and moved the two files
that *legitimately* still read db (`limits.ts` querying workspace
quotas, `converters.ts` needing db enum shapes for proto round-trip)
into the `ignore` list. Future siblings are enforced by default
rather than silently slipping through.
**P2 — `clampPeriodicity` unknown values returned too fast.**
`PERIODICITY_ORDER.indexOf("unknown") === -1` → `Math.max(-1, 0) === 0`
→ walk started at `"30s"` (the fastest tier). Could return an
interval faster than requested, violating the
"never-faster-than-requested" invariant. Short-circuits now to the
slowest allowed tier when the requested value isn't a known
periodicity. Added unit tests covering the unknown-value + empty-
allowed fallback paths.
**P2 — component/monitor limit warnings counted total resources, not
quota-consuming inserts.**
If the import contained 4 components and 3 already existed (would be
skipped as duplicates), the warning claimed `"Only X of 4 can be
imported"` — but actually zero quota would be consumed by the 3
skips, so the real new-creation count might fit entirely. Reworded
to `"Only N new components may be created … some of the M in the
import may already exist and be skipped"`. Same treatment for the
monitors warning. Preview stays DB-light (no per-resource existence
checks); the warning now honestly conveys worst-case without
misleading users about what will actually happen. Test assertions
updated to match the new wording with substring matches that aren't
tied to the exact fraction.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(services/page): address Claude review on #2109
Six items from Claude's review, going with the calls I leaned toward
in the question-back:
**P2 — tRPC `updateCustomDomain` wasteful `getPage` read.**
Was calling `getPage(id)` (fires 3 batched relation queries:
maintenances + components + groups) just to grab `customDomain`
before the Vercel add/remove calls. Added a narrow
`getPageCustomDomain` service helper — single indexed lookup,
workspace-scoped, returns the string directly. Router swapped over.
Service-layer authority preserved; no db reads leak into the router.
**P2 — Connect `updateStatusPage` slug-race code drift.**
Handler pre-checks slug to surface `slugAlreadyExistsError`
(`Code.AlreadyExists`). The `updatePageGeneral` service call
re-validates via `assertSlugAvailable` → `ConflictError` →
`Code.InvalidArgument` in the race where two callers both clear the
pre-check. Wrapped the call in `try/catch (ConflictError)` and
rethrow as `slugAlreadyExistsError(req.slug)` so gRPC clients keying
on the code get a consistent `AlreadyExists` whether they lose at
the pre-check or at the inner tx.
**P2 — Connect `createStatusPage` / `updateStatusPage` customDomain
without Vercel sync.** Pre-existing behaviour (the direct-db handler
had the same gap). Added a top-of-impl comment so it doesn't go
unnoticed — the fix is a shared transport-layer helper the Connect
handlers can reuse, out of scope for this migration PR to keep the
behavioural blast radius small for external API consumers.
**P3 — double cast `row as unknown as Page` in `create.ts`.**
The drizzle insert-returning type and the `Page` type diverge on
`authEmailDomains` / `allowedIpRanges` (raw comma-joined string vs
parsed `string[]`). Replaced the double casts with
`selectPageSchema.parse(row)` which normalises the row into the shape
callers expect. Cast-drift is now impossible to introduce silently.
**P3 — `void ConflictError;` workaround.**
Import was unused in `create.ts`; the `void` line was silencing the
unused-import warning rather than fixing the cause. Removed both.
**P3 — deprecated `passwordProtected` column.**
Added a doc block on `updatePagePasswordProtection` flagging that
the deprecated boolean column is intentionally not written here (the
v1 REST read path derives it from `accessType` via
`normalizePasswordProtected`). Prevents a future reader from
mistaking the omission for an oversight and writing two sources of
truth for the same signal.
Test coverage for the 5 untested update services
(`updatePagePasswordProtection`, `updatePageCustomDomain`,
`updatePageAppearance`, `updatePageLinks`, `updatePageConfiguration`)
deferred to a follow-up per Claude's "not blocking" marker — the
failing-edge behaviour is the critical bit, and
`updatePagePasswordProtection` already has indirect coverage through
the Connect handler tests on this branch.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(services): address Claude review on #2108
Four items from Claude's review of the Connect notification handler
backfill:
**P3 — `protoDataToServiceInput` swallowed parse failures.**
`try { JSON.parse } catch { return {} }` was hiding any malformed
output from `protoDataToDb` (which would be a programmer error, not
user-input) behind a generic empty-object fallback. The downstream
`validateNotificationData` then failed with a far less specific
error. Let the throw propagate — `toConnectError` maps it to
`Code.Internal`, which is the signal we want for "the helper itself
misbehaved."
**P3 — `createNotification` response approximated the monitor IDs.**
Was echoing `Array.from(new Set(req.monitorIds))` on the happy path
(correct, since the service validates + throws on invalid) but the
approximation diverged from `updateNotification`'s re-fetch pattern.
Now re-fetches via `getNotification` after create so the response
reflects what's actually in the DB — one extra IN query per create,
eliminates the approximation entirely, makes both handlers
structurally identical.
**P3 — `sendTestNotification` bypassed `toConnectError`.**
Only handler in the impl without a `try { … } catch { toConnectError
}` wrap, so any thrown `ServiceError` / `ZodError` from
`test-providers.ts` fell through to the interceptor's generic catch
and surfaced with a less precise gRPC status. Wrapped for symmetry.
**P3 — `JSON.parse(existing.data)` null-unsafe.**
Drizzle infers `notification.data` as `string | null` (the column has
`default("{}")` but no `.notNull()`). A legacy row with `NULL` in the
column would crash `updateNotification` with `SyntaxError` during
the partial-update read-modify-write. Added `?? "{}"` fallback and a
comment pointing at the schema.
Cubic's single finding from the earlier pass (dedupe of
`req.monitorIds` in the create response) was already applied in
`b69ad13` and has now been superseded by the re-fetch above.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(services): address latest Cubic pass on #2108
Five findings from Cubic's second review cycle on this PR, all on
files that entered this branch via the #2109 (status-page) and #2111
(import) squash-merges stacked on top. Fixing here so the cumulative
state reaching main is clean.
**P1 — `page/create.ts` double-encoded JSON configuration.**
`page.configuration` is a drizzle `text("…", { mode: "json" })`
column — drizzle serialises objects automatically. Calling
`JSON.stringify(configuration)` first stored a raw JSON string in
the column, breaking any downstream read that expects an object
(e.g. the appearance merge at `update.ts:185`). Dropped the wrap;
drizzle handles it.
**P2 — `page/schemas.ts` slug + customDomain validation weaker than
insert schema.**
`NewPageInput.slug`, `GetSlugAvailableInput.slug`, and
`UpdatePageGeneralInput.slug` were `z.string().toLowerCase()` — no
regex, no min-length. `UpdatePageCustomDomainInput.customDomain`
was `z.string().toLowerCase()` — no format check. Meant the service
would accept malformed slugs / URLs that `createPage` would then
reject via `insertPageSchema`, or — worse — that `getSlugAvailable`
would confidently return "available" for garbage. Exported the
canonical `slugSchema` + `customDomainSchema` from
`@openstatus/db/src/schema/pages/validation` and reused them across
all four service inputs; db validation is now the single source of
truth for page slug/domain shape.
**P2 — `api/router/import.ts` nullish → optional contract narrowing.**
The service's `PreviewImportInput`/`RunImportInput` used `.optional()`
for the three provider page-id fields, which dropped the `null`
acceptance the legacy router had via `.nullish()`. Existing clients
sending `null` would have started hitting `Invalid input` errors
after the import migration landed. Added a `nullishString` transform
in the service schema that accepts `string | null | undefined` and
normalises to `string | undefined` before it reaches
`buildProviderConfig` — callers keep the broader contract, service
internals stay ignorant of `null`.
**P2 — `page/update.ts` empty array stored "" not null.**
`authEmailDomains?.join(",") ?? null` coerces `null`/`undefined` to
`null`, but `[].join(",")` returns `""` (empty string) which `??`
treats as a value. Callers sending `authEmailDomains: []` to clear
the column were persisting the empty string instead of nulling it —
misleading "present but blank" state. Switched to `|| null` on both
array-join outputs (`authEmailDomains` + `allowedIpRanges`) so the
three clearing inputs — `undefined`, `null`, `[]` — all land on DB
`NULL` while real non-empty joins pass through unchanged.
Test fixtures already use slugs ≥ 3 chars that match the regex, so
the tightened validation doesn't break any existing assertions.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(services/page-component): address Cubic + Claude review on #2107
Three fixes + one test, addressing both the original Cubic finding
and Claude's re-review pass.
**P2 — Discriminated union for `componentInput`.**
The flat `z.object` with `type: z.enum(["monitor", "static"])` and
optional `monitorId` let callers submit a "monitor" component with
no monitor id, or a "static" one with a monitor id attached. The DB
catches it with a `CHECK` constraint, but that surfaces as an opaque
SQLite CHECK failure instead of a clean `ZodError` at the service
boundary. Replaced with a `z.discriminatedUnion("type", [...])` that
requires `monitorId` on the "monitor" arm and omits it on the
"static" arm.
Fallout in `update-order.ts`: `c.monitorId` no longer exists on the
"static" arm after narrowing, so the spreads now use
`monitorId: c.type === "monitor" ? c.monitorId : null`. The defensive
`&& c.monitorId` guards on the already-narrowed monitor branches are
gone (TypeScript enforces the invariant the DB was catching late).
**P2 — Sequential group insert instead of bulk `.returning()`.**
The bulk insert relied on drizzle/SQLite returning rows in the same
order they were inserted, so `newGroups[i]` could line up with
`input.groups[i]` when mapping components to their groups. True on
Turso today, but an implicit coupling — any driver change, batch
split, or upstream sort could silently reorder rows and land
components in the wrong group with no error signal. Switched to a
loop that captures each group id before moving on; the set size is
bounded by the status-page component-group plan cap so the extra
round trips are a rounding error.
**Nit — removed dead `hasStaticComponentsInInput` guard.**
Both the "input has static components but none carry ids" and "input
has no static components at all" branches collapsed to the same
"drop all existing static components" action, so the outer
`hasStaticComponentsInInput` conditional was doing no work. Dropped
the variable and the nested branch.
**Test — upsert idempotency.**
The `onConflictDoUpdate` on `(pageId, monitorId)` was the riskiest
untested path — a regression would silently insert duplicate rows on
every re-invocation. Added a test that calls
`updatePageComponentOrder` twice on the same page with the same
`monitorId`, then asserts there's exactly one matching row and the
second call's values won.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(services): address latest Cubic pass on #2107 + unblock build
Eight Cubic findings from the second review plus one dashboard build
break from my earlier discriminated-union change.
**Build — router shape diverged from service discriminated union.**
`packages/api/src/router/pageComponent.ts` kept its own flat
`z.object({...})` input schema with `type: z.enum(["monitor",
"static"])` and `monitorId: z.number().nullish()`. After the service
switched to `z.discriminatedUnion("type", [...])`, TS couldn't
reconcile the two — dashboard build failed. Replaced the local
schema with the service's exported `UpdatePageComponentOrderInput`
so both layers share the canonical shape.
**P1 — page router: validate customDomain before Vercel call.**
The router input was `z.string().toLowerCase()` (no format check) and
the service's `customDomainSchema` only fired inside
`updatePageCustomDomain`, *after* the Vercel add/remove mutations.
A malformed domain could be added to Vercel, then rejected by the
service, leaving Vercel/db state drifted. Switched the router input
to the service's `UpdatePageCustomDomainInput` so format validation
runs at tRPC input parsing, before any Vercel call.
**P1 — `listApiKeys` leaked `hashedToken`.**
`SELECT *` returned every column including the bcrypt hash of each
key's one-time token, which has no business appearing in a list
response. Replaced with an explicit column select that omits
`hashedToken`. New `PublicApiKey` type (`Omit<ApiKey,
"hashedToken">`) is the return shape; exported from the barrel.
**P2 — `acceptInvitation` eager workspace load + second fetch.**
The initial `findFirst` already loaded the workspace via `with: {
workspace: true }`, but the return value re-fetched it by id. Use
the joined value directly — one round-trip instead of two, and
eliminates the read-skew window where a just-renamed workspace
could appear with a different name in each fetch.
**P2 — `import.run` audit entityId 0.**
`entityId: targetPageId ?? 0` wrote a ghost `page 0` reference to
the audit trail when the page phase failed before producing an id.
Entity attribution now falls back to the workspace (`entityType:
"workspace"`, `entityId: ctx.workspace.id`) when no target page is
in play — real rollback signal, no phantom foreign key.
**P2 — `page-components` limit scoped per-page, not workspace.**
`page-components` is a workspace-wide cap (see
`page-component/update-order.ts` — counts every component across
every page). The import preview and run's component check were
scoping the existing count to `targetPageId`, which understated
pressure and would let imports push past the cap at write time.
Both sites now count workspace-wide.
**P2 — `writeIncidentsPhase` lacked idempotency.**
Every other phase writer checks for an existing row before
inserting (page by slug, monitor by url, component by name,
subscriber by email); `writeIncidentsPhase` inserted
unconditionally. A re-run would duplicate status reports on every
pass. Added an existence check by `(title, pageId, workspaceId)`
matching the convention.
**P2 — `writeMaintenancesPhase` lacked idempotency.**
Same pattern. Added a check by `(title, pageId, from, to,
workspaceId)` — the `from/to` pair is load-bearing because
maintenance titles recur ("DB upgrade") across unrelated windows.
**P2 — `writeComponentsPhase` silent monitor→static fallback.**
When the source monitor failed to resolve (e.g. `includeMonitors ===
false`), the component was silently degraded to `type = "static"`
and reported as `created` with no explanation. Other phase writers
populate `resource.error` on any degrade path. Added a matching
error string pointing at the source monitor id (or lack thereof) so
the summary conveys the degrade.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(services/page-component): revert discriminated union to flat + .refine
The previous commit switched `componentInput` in the service schema to
a `z.discriminatedUnion("type", [...])` to get a clean `ZodError` at
parse time for the monitor/static invariant. That produced a narrowed
TS shape (`type: "monitor"` → required `monitorId: number`) that every
caller had to match — including the dashboard form, where
react-hook-form can't model discriminated unions cleanly and would
have needed a flat→union adapter at submit time. The ripple was
user-visible frontend churn for a schema-layer concern.
Switched back to a flat `z.object` + cross-field `.refine` on
`(type, monitorId)`. Same parse-time rejection Cubic asked for
(ZodError with a specific path, not an opaque SQLite CHECK failure),
but the inferred TS type stays flat so callers — router input, RHF
form values — keep their existing shape.
Also restored the downstream `&& c.monitorId` guards and
`as number[]` casts in `update-order.ts`. With a flat schema, TS
still sees `monitorId: number | null | undefined` on the monitor
branch; the refine rejects violating input at parse time, but the
guard is needed to narrow for the type system. Matches the
pre-migration shape exactly.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(services/import): reconcile component links on idempotent skip
The idempotency checks in `writeIncidentsPhase` and
`writeMaintenancesPhase` added in the previous pass correctly avoid
duplicate status-report / maintenance rows on rerun, but `continue`-d
out of the writer before the component-link insertion block. The
failure mode this leaves open:
1. Run 1: component phase uses per-resource catch, so a single
component can fail and leave `componentIdMap` partial.
2. The report/maintenance is written with a subset of the intended
links — only the entries whose source id resolved in the map.
3. Run 2: the previously-failed component now succeeds and lands in
`componentIdMap`. The report/maintenance idempotency check hits,
`continue` fires, and the still-missing link is never written.
Both join tables (`statusReportsToPageComponents`,
`maintenancesToPageComponents`) have a composite primary key on
`(parentId, pageComponentId)`. Running the same link-build pass on
the skip path with `.onConflictDoNothing()` is a no-op for the links
already present and adds any that resolved this time round. Matches
the "reruns converge to correct state" model that motivated the
idempotency checks in the first place.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(services/notification): address Claude review pass
Three small findings from the latest Claude review, bundled:
**#1 — `list.ts` redundant `conditions` array.** `listNotifications`
initialised a one-element `SQL[]` and spread it into `and(...)` with
no second push site anywhere. Collapsed to a direct `eq(...)`. The
pattern is load-bearing in `monitor/list.ts` (two conditions: workspace
scope + soft-delete filter) but notifications have no soft-delete
column, so the indirection was pure noise.
**#2 — router `dataInputSchema` duplicated the service schema.**
`packages/api/src/router/notification.ts` hand-rolled a
`z.partialRecord` structurally identical to the `dataSchema` inside
`packages/services/src/notification/schemas.ts`. Drift hazard: a
future provider-value shape change in the service would be accepted
at the tRPC layer and fail with an opaque error deeper in
`validateNotificationData`. Renamed the service schema to
`NotificationDataInputSchema`, exported it from the service barrel,
and replaced the router's local copy + the now-unused
`servicesNotificationProvider` alias.
**#5 — `update.ts` audit missing `provider` metadata.**
`createNotification` attaches `metadata: { provider: input.provider
}` to its audit row; `updateNotification` didn't. The `provider`
column is recoverable from the `before`/`after` rows, but asymmetric
metadata breaks simple `action + metadata.provider` audit queries.
Added `metadata: { provider: existing.provider }` for parity.
Skipped the two non-fixes: the `enrichNotificationsBatch` drizzle-cast
fragility is the same pattern as `monitor/list.ts`, worth a codebase-
wide change rather than a single-domain carve-out; the `dataSchema`
being "intentionally wide" is already called out in the schema JSDoc
and is correct by design (provider/payload alignment is enforced at
the service boundary by `validateNotificationData`).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* test(services/notification): cover update audit + post-downgrade gate
Claude review noted two gaps in the `updateNotification` suite. Adding
both:
**`notification.update` audit row.** `createNotification` already
asserts an audit row fires with `expectAuditRow`; the update path was
silent. With the per-mutation audit contract (every write emits a
row), the update case needs an equivalent pin so a regression that
drops the emit site is caught. Note: the v1 audit buffer shape
(`AuditLogRecord`) doesn't carry `metadata`, so the `{ provider }`
payload can't be asserted directly here — that coverage lands with
the v2 audit-table move, called out in the test comment.
**Plan-gate after a downgrade.** The Cubic-flagged fix added
`assertProviderAllowed(existing.provider)` to `updateNotification`
so a previously-allowed channel becomes read-only once the workspace
drops to a plan that no longer includes it. The regression test
simulates the downgrade by directly inserti…
* feat(services): migrate maintenance domain onto service layer
Second domain migration, stacked on the status-report PR. Same shape as
PR 1 — every write path and read path for `maintenance` now goes through
`@openstatus/services/maintenance`; tRPC / Connect / Slack are thin
adapters.
## Services (`packages/services/src/maintenance/`)
- `createMaintenance`, `updateMaintenance`, `deleteMaintenance`,
`listMaintenances`, `getMaintenance`, `notifyMaintenance`.
- Date range validated at the Zod refine + again inside `updateMaintenance`
when partial updates could cross the invariant.
- All mutations inside `withTransaction`, emit `emitAudit` before returning.
- Internal helpers duplicated from status-report (`validatePageComponentIds`,
`updatePageComponentAssociations`, `getMaintenanceInWorkspace`,
`getPageComponentIdsForMaintenance`, `getPageComponentsForMaintenance`).
A third consumer should trigger the shared-helper extraction.
## Surfaces
- **tRPC** (`packages/api/src/router/maintenance.ts`): all four procedures
(`delete` / `list` / `new` / `update`) now call services.
`emailRouter.sendMaintenance` becomes a thin wrapper over
`notifyMaintenance`.
- **Connect RPC** (`apps/server/src/routes/rpc/services/maintenance/`):
handler rewritten as proto → parse → service → convert.
External contract preserved.
- **Slack** (`interactions.ts`): `createMaintenance` branch now calls
services. Also extracts `getPageUrl`/`getReportUrl` into
`apps/server/src/routes/slack/page-urls.ts` — pure transport-layer URL
formatting, left on db directly but isolated so `interactions.ts` itself
becomes db-free.
## Enforcement
- Biome `noRestrictedImports` override now includes:
`packages/api/src/router/maintenance.ts`,
`apps/server/src/routes/rpc/services/maintenance/**`, and
`apps/server/src/routes/slack/interactions.ts`.
(`page-urls.ts`, `service-adapter.ts`, `confirmation-store.ts`, and
`workspace-resolver.ts` remain outside scope — they legitimately need db
access for transport/URL/auth concerns.)
- Subpath export `@openstatus/services/maintenance`.
## Tests
- Integration tests in `packages/services/src/maintenance/__tests__/`
cover happy paths, workspace isolation, cascade deletes, cross-workspace
`NotFoundError` / `ForbiddenError`, the Zod date-range refine, the slack
actor audit branch.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(services/maintenance): address Cubic review
Mirrors the symmetric fixes applied to status-report on the parent
branch:
- **Dedupe `pageComponentIds`** in `validatePageComponentIds` — duplicate
ids would violate the composite PK on `maintenance_to_page_component`.
- **Batch list enrichment** — `listMaintenances` previously ran 2 extra
queries per row; rewritten as one IN query regardless of list size,
pairing well with the 10_000 sentinel tRPC passes.
- **Idempotent tRPC `delete`** — swallow `NotFoundError` in the wrapper
to preserve the old drizzle-returning behaviour; Connect still
returns 404.
- **Connect numeric-id error** — replace `invalidDateFormatError` with an
inline `ConnectError(Code.InvalidArgument)` for malformed page component
ids. Correct error message on the wire.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(services/maintenance): parity with status-report post-review fixes
Three issues mirror what landed on status-report after its PR-review
pass — this branch was behind because it hadn't been rebased onto the
updated status-report yet.
**`update.ts` — pageId follows components**
The service was enforcing a strict invariant: throw `ConflictError`
when new components belonged to a different page, and leave `pageId`
untouched when components were cleared. The Connect
`UpdateMaintenance > clears pageId when removing all components`
test fails against this — it expects `pageId` to null out on empty
components. Same reasoning as the status-report fix: `pageId` should
follow the association set. Mixed-page inputs still rejected
upstream by `validatePageComponentIds`.
**Handler test — error message wording**
Two assertions expected the old handler's `"Start time (from) must
be before end time"` message. Service throws `"End date must be
after start date."` via `ConflictError`. Updated both assertions to
match service wording — consistent with how the status-report
message drift was resolved.
**Service test — cleanup without try/finally**
7 tests did inline `db.delete(maintenance)` at the end of their body;
a failing assertion skipped cleanup and orphaned rows. Replaced with
a shared `createdMaintenanceIds` array drained in `afterEach`,
matching the status-report pattern.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(services): Slack error opacity + maintenance test drift
Three CI failures on #2103:
**Slack `toSlackMessage` — revert to uniform generic message.**
The P3 fix on #2101 kept hand-written `ServiceError` messages for
`FORBIDDEN` / `CONFLICT` / `VALIDATION` / `LIMIT_EXCEEDED` — but the
existing `does not leak internal error details to user` test expects
every error to surface as a single `"Something went wrong. Please try
again."` string. The intent is stronger than just "strip row IDs":
Slack users aren't developers, so none of the service error wording
is appropriate for them. Reduced the adapter to a one-liner that
always returns the generic message; detailed error context still
flows to logtape/Sentry via the catch site upstream.
**Handler test — "does not match the page ID" → "does not match".**
`ConflictError` wording drifted during the migration (same as the
status-report fix). Loosened to the stable fragment.
**Handler test — "Page not found" → "not found".**
`NotFoundError("page", id)` formats as `"page <id> not found"`
(lowercase `page`). Pre-migration handler emitted capital-P
`"Page not found"`. Loosened the substring so both forms pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(services): migrate incident domain (PR 3/N) (#2104)
* feat(services): migrate incident domain onto service layer
Third domain migration, stacked on maintenance. tRPC-only — no Connect
handler, no Slack path — so this PR mostly exercises the
`ServiceContext` actor variants without surface-specific adapters.
## Services (`packages/services/src/incident/`)
- `acknowledgeIncident`, `resolveIncident`, `deleteIncident`,
`listIncidents`, `getIncident`.
- Both `acknowledge` / `resolve` stamp the acting user's id onto
`acknowledged_by` / `resolved_by` via the new `tryGetActorUserId`
helper on `ServiceContext`. Non-user actors (system, webhook, API keys
without a linked userId) stamp `null`.
- Idempotent `ConflictError` when an incident is already acknowledged
or resolved — matches existing tRPC's `BAD_REQUEST` semantics.
- All mutations wrapped in `withTransaction`, emit `emitAudit`.
## list.ts
- Batch-enriches monitors via a single IN query against distinct
`monitorId`s — pairs well with the 10_000 sentinel tRPC passes.
Avoids the N+1 Cubic flagged on the maintenance PR.
## Surfaces
- **tRPC** (`packages/api/src/router/incident.ts`): all four procedures
call services. `delete` catches `NotFoundError` to preserve the
pre-migration idempotent behaviour. `list` narrows `monitor` to
non-null in the return type — every incident is expected to have an
associated monitor via the FK. `acknowledge` / `resolve` still return
`true` to match the old contract.
## Context helper
- `tryGetActorUserId(actor)` in `packages/services/src/context.ts` —
returns the openstatus user id for user/apiKey/slack actors when
available, `null` for system/webhook. Exported from the root barrel.
## Enforcement
- Biome `noRestrictedImports` scope adds
`packages/api/src/router/incident.ts`.
- Subpath export `@openstatus/services/incident`.
## Tests
- `packages/services/src/incident/__tests__/incident.test.ts` — happy
paths for acknowledge / resolve / delete, already-acknowledged and
already-resolved `ConflictError`, workspace isolation across all
mutations, list workspace isolation, and the batch monitor enrichment.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(services/incident): address Cubic review
- **TOCTOU on acknowledge / resolve** — the old read-then-update allowed
two concurrent acknowledgers (or resolvers) to both succeed. Replaced
with a conditional update (`WHERE acknowledged_at IS NULL` /
`WHERE resolved_at IS NULL`); the loser of the race gets no row back
and we throw the same `ConflictError` the pre-read would have raised.
Dropped the now-unreachable `InternalServiceError` branch.
- **Monitor enrichment workspace scope** — `enrichIncidentsBatch` now
filters `monitor.workspaceId = ctx.workspace.id` alongside the
`inArray` on monitor id. Defence-in-depth: the `incident.monitorId`
column has no FK against workspace ownership, so a cross-workspace
pointer (however unlikely) no longer leaks the other workspace's
monitor row.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(services): migrate monitor domain tRPC (PR 4/N) (#2105)
* feat(services): migrate monitor domain tRPC onto service layer
Fourth domain, the largest so far. Migrates the **tRPC** monitor router
(all 14 procedures) onto `@openstatus/services/monitor`. The Connect
handler (`apps/server/src/routes/rpc/services/monitor/`) and the v1 REST
`apps/server/src/routes/v1/monitors/*` endpoints stay untouched for now
— they are separate external-API surfaces and warrant their own
follow-up PRs with their own test suites.
- `createMonitor`, `cloneMonitor`, `deleteMonitor`, `deleteMonitors`
(bulk soft-delete), `getMonitor`, `listMonitors`.
- `updateMonitorGeneral` — preserves the existing tRPC `updateGeneral`
behaviour including jobType switching (HTTP ↔ TCP ↔ DNS). Called out
as a code smell in the explore, but preserved intentionally since
it's the dashboard's current edit flow. Kept jobType-agnostic rather
than split into 3 separate update methods.
- `updateMonitorRetry`, `updateMonitorFollowRedirects`,
`updateMonitorOtel`, `updateMonitorPublic`,
`updateMonitorResponseTime` — field-specific setters.
- `updateMonitorSchedulingRegions` — enforces plan limits
(periodicity / region-count / region-access) + validates the
private-location ids before replacing the association set.
- `updateMonitorTags`, `updateMonitorNotifiers` — validate and
replace the tag / notifier association sets.
- `bulkUpdateMonitors` — batched toggle of `public` / `active`
across multiple monitor ids. Matches the old `updateMonitors`
procedure.
- All mutations run inside `withTransaction`, emit `emitAudit`.
- Cascade deletes: `monitor_tag_to_monitor`,
`notifications_to_monitors`, `page_component` rows are torn down on
delete (matches pre-migration behaviour — bypasses FK cascades
because some rows reference the monitor without cascade).
- `validateTagIds` / `validateNotificationIds` / `validatePrivateLocationIds`
— workspace-scoped validators with dedupe.
- `pickDefaultRegions(workspace)` — ports the old "randomly pick 4
free / 6 paid regions excluding deprecated" logic.
- `serialiseAssertions` / `headersToDbJson` — assertion / header
serialisation moved out of the tRPC router.
- `countMonitorsInWorkspace` for quota checks.
`listMonitors` does two IN queries (tags + incidents) regardless of
list size. `getMonitor` reuses the same path with the richer
`{ notifications, privateLocations }` toggle and a singleton. Same
pattern as status-report's batched fix.
- **tRPC** (`packages/api/src/router/monitor.ts`): every procedure is a
thin wrapper. `delete` catches `NotFoundError` for idempotency.
`new` / `updateGeneral` keep the `testHttp` / `testTcp` / `testDns`
pre-save check at the tRPC layer — services unconditionally save,
callers decide whether to pre-validate.
- Connect RPC handler at `apps/server/src/routes/rpc/services/monitor/`
— still uses its own helpers. Will be migrated in a follow-up
(4b).
- v1 REST endpoints at `apps/server/src/routes/v1/monitors/*` — yet
another external surface; dedicated PR later.
- `packages/api/src/service/import.ts` monitor writes — covered by
the plan's dedicated PR 9.
- Biome `noRestrictedImports` scope adds
`packages/api/src/router/monitor.ts`. The Connect monitor handler
stays out of scope until 4b.
- Subpath export `@openstatus/services/monitor`.
- `packages/services/src/monitor/__tests__/monitor.test.ts` —
create (http / tcp / dns), delete cascade, bulk delete, clone,
tags / notifiers validation (forbidden when cross-workspace),
list / get with workspace isolation + soft-delete hiding,
updateMonitorGeneral round-trip.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(services): declare @openstatus/regions + @openstatus/assertions deps
The monitor domain pulls `regionDict` from `@openstatus/regions` (for
the default-region picker) and assertion classes / validators from
`@openstatus/assertions`. Both were missing from
`packages/services/package.json`, so Vercel's Next.js build for
`apps/status-page` — which transitively imports the services package
via `@openstatus/api` — failed with "Module not found" on
`packages/services/src/monitor/internal.ts:20` and `schemas.ts:1`.
Local pnpm workspaces resolved the imports via hoisting, which masked
the missing declarations until an actual Next.js build forced strict
resolution.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(services/monitor): address Cubic review
- **`timeout` / `degradedAfter` bounds** (schemas.ts) — mirror the
0–60_000 ms cap from `insertMonitorSchema`. Values outside the range
are rejected before the UPDATE instead of being silently persisted.
- **`jsonBody` assertion mapping** (internal.ts) — the serialiser was
silently dropping `jsonBody`-typed input because the runtime class
wasn't wired up. Added the `JsonBodyAssertion` branch; the class has
existed in `@openstatus/assertions` the whole time, this was just a
missing case.
- **Clone resets `status`** (clone.ts) — cloning no longer inherits the
source's current `error`/`degraded` health. Freshly cloned monitors
start at `"active"` and settle on their first check.
- **Delete filters soft-deleted** (delete.ts) — the pre-check now
includes `isNull(monitor.deletedAt)`, so a repeat delete returns
`NotFoundError` (preserved as idempotent at the tRPC layer) instead
of re-running the cascades and emitting duplicate audits.
- **List/get workspace scope on relations** (list.ts) — `enrichMonitorsBatch`
takes `workspaceId` and scopes the incident / tag / notification /
private-location IN queries to the caller's workspace. Defence-in-depth
against inconsistent FK data (none of those tables enforce workspace
ownership at the FK level).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(services): migrate notification CRUD (PR 5/N) (#2106)
* feat(services): migrate notification CRUD onto service layer
Fifth domain. Migrates the four CRUD procedures — `list`, `new`,
`updateNotifier`, `delete` — of the notification tRPC router onto
`@openstatus/services/notification`. The three integration-helper
procedures (`sendTest`, `createTelegramToken`, `getTelegramUpdates`)
stay inline at the tRPC layer and are explicitly out of scope for this
migration.
- `createNotification` — enforces plan limits on both the channel count
(`notification-channels`) and the provider itself (`sms` / `pagerduty`
/ `opsgenie` / `grafana-oncall` / `whatsapp` require plan flags);
validates the loose `data` payload against `NotificationDataSchema`;
validates monitor ids are in-workspace and not soft-deleted.
- `updateNotification` — replaces name / data / monitor associations in
a single transaction with the same validation rules as create.
- `deleteNotification` — hard-delete (FK cascade clears associations).
The tRPC wrapper swallows `NotFoundError` to preserve the old
idempotent behaviour.
- `listNotifications`, `getNotification` — batched IN query enriches
monitors per notification. Monitor enrichment is workspace-scoped and
filters soft-deleted monitors for defence-in-depth.
- All mutations run inside `withTransaction`, emit `emitAudit`.
- **tRPC** (`packages/api/src/router/notification.ts`): `list` / `new` /
`updateNotifier` / `delete` become thin service wrappers. `sendTest`
+ `createTelegramToken` + `getTelegramUpdates` are unchanged.
- **`sendTest` migration** — the dispatch switch imports from 10
`@openstatus/notification-*` packages. Moving it into services would
pull those as direct deps; the plan's phrasing ("Channel CRUD +
test-dispatch") allows this as a later extraction.
- **`createTelegramToken` / `getTelegramUpdates`** — redis + external
Telegram API helpers; transport UX, not domain operations.
- **Biome scope for `notification.ts`** — the file still imports
`@openstatus/db/src/schema` for the `sendTest` provider data schemas.
Will land with the sendTest migration follow-up.
- **Connect RPC notification handler** — stays on its own helpers;
follow-up aligned with PR 4's Connect deferral.
- `__tests__/notification.test.ts` covers create (including
`ValidationError` on malformed data, `LimitExceededError` on gated
provider, `ForbiddenError` on cross-workspace monitor), update
(association replacement, cross-workspace `NotFoundError`), delete,
list/get workspace isolation + monitor enrichment scope.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(services/notification): address Cubic review
- **Update flow plan gate** (update.ts) — `updateNotification` was
skipping `assertProviderAllowed`, so a user who downgraded their plan
could still edit a notification configured with a now-restricted
provider. Re-check against the stored `existing.provider` to match
the create-time gate.
- **Provider / data match** (internal.ts) — `NotificationDataSchema` is
a union, so `{ provider: "discord", data: { slack: "…" } }` passed
the union check even though the payload key doesn't match the
provider. `validateNotificationData` now takes the provider and
asserts `provider in data` after the top-level parse. Applied in
both `create` and `update` (update uses the stored provider since
the API doesn't allow provider changes).
Added a test for the mismatch case.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(services/notification): tighten data validation against provider schema
Cubic's follow-up on the previous fix was right: checking only
`provider in data` isn't enough. `NotificationDataSchema` is a union,
so a payload like `{ discord: "not-a-url", slack: "valid-url" }` passes
because the union matches the slack variant — the extra `discord` key
is ignored, and my key-presence check sees `"discord"` and lets it
through.
Replaced the union parse + key check with a provider-specific schema
lookup (`providerDataSchemas[provider].safeParse(data)`). Each
canonical channel schema is keyed by its provider name and validates
the shape / content of the value, so the new check catches both the
mismatched-provider and malformed-payload cases in one pass.
Added a test covering the exact case Cubic flagged — invalid `discord`
URL alongside a valid `slack` URL now rejects with ValidationError.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(services): migrate page-component domain (PR 6/N) (#2107)
* feat(services): migrate page-component domain onto service layer
Sixth domain, tRPC-only. Migrates `list`, `delete`, and `updateOrder`
onto `@openstatus/services/page-component`.
- `listPageComponents` — workspace-scoped filter + optional pageId
filter. Batched enrichment in four IN queries (monitors, groups,
status reports via join, maintenances via join). All relation queries
scoped to the caller's workspace for defence-in-depth.
- `deletePageComponent` — hard-delete. Cascade clears the
`status_report_to_page_component` / `maintenance_to_page_component`
associations. The tRPC wrapper swallows `NotFoundError` to preserve
the pre-migration idempotent behaviour.
- `updatePageComponentOrder` — the complex one. Mirrors the existing
diff-and-reconcile pass faithfully (≈220 lines → a single transaction):
1. Assert the page is in the workspace.
2. Enforce the workspace's `page-components` plan cap.
3. Validate every monitor id in the input set.
4. Remove monitor components whose monitorId isn't in the input;
remove static components based on whether the input carries ids.
5. Clear `groupId` before dropping groups (FK safety), then recreate
groups.
6. Upsert monitor components via `onConflictDoUpdate` on the
`(pageId, monitorId)` unique constraint (preserves ids).
7. Update existing static components by id; insert new ones.
Audit: `page_component.update_order` / `page_component.delete`.
- **tRPC** (`packages/api/src/router/pageComponent.ts`): all three
procedures call services. `delete` catches `NotFoundError` and
returns the old `drizzle.returning()`-shaped empty array. The
pre-existing `pageComponent.test.ts` (tests cross-workspace monitorId
→ `TRPCError(FORBIDDEN)`) is untouched and still valid — my services
throw `ForbiddenError`, which `toTRPCError` maps to the same code.
- Biome `noRestrictedImports` scope adds
`packages/api/src/router/pageComponent.ts`.
- Subpath export `@openstatus/services/page-component`.
- `__tests__/page-component.test.ts` covers `updatePageComponentOrder`
happy path (creates monitor + static + grouped components), rejects
cross-workspace monitorId and cross-workspace pageId, `list`
workspace isolation, `delete` cross-workspace `NotFoundError`.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(services): Connect RPC notification handler catch-up (#2108)
* feat(services): Connect RPC notification handler onto services (catch-up)
Follow-up to PR 5 — noticed on review that my PRs from PR 4 onwards had
been narrowing scope to tRPC only and deferring Connect handlers, which
was piling up. This closes the notification Connect gap.
`apps/server/src/routes/rpc/services/notification/index.ts` — the five
CRUD methods now delegate to `@openstatus/services/notification`:
- `createNotification` → `createNotification` service (handles the
plan-count limit, per-provider plan gate, and data-schema validation
internally — the Connect-side `checkNotificationLimit` /
`checkProviderAllowed` / `validateProviderDataConsistency` calls are
gone).
- `getNotification`, `listNotifications`, `updateNotification`,
`deleteNotification` — thin proto-to-service-to-proto wrappers.
- `updateNotification` reads the existing record via the service and
fills in missing fields (Connect's update is partial; the service
expects a full payload), then applies the update.
Left inline:
- `sendTestNotification` — calls `test-providers.ts` (external HTTP).
- `checkNotificationLimit` RPC method — returns the count info via
`./limits.ts` helpers (pure queries, no domain mutation).
The local Connect helpers (`validateProviderDataConsistency`,
`checkNotificationLimit`, `checkProviderAllowed`, and the ad-hoc
`validateMonitorIds` / `updateMonitorAssociations` / `getMonitorById` /
`getMonitorCountForNotification` / `getMonitorIdsForNotification`) are
no longer imported by `index.ts`; they remain in their files because
`test-providers.ts` and the unmigrated Connect monitor handler still
reference some of them.
Added `apps/server/src/routes/rpc/services/notification/index.ts` to
the `noRestrictedImports` scope. The directory-level glob isn't a fit
because `limits.ts` and `test-providers.ts` legitimately need direct
db access until their own follow-up migrations.
- **Connect monitor handler** (~880 lines, 6 jobType-specific
create/update methods + 3 external-integration methods) — requires a
much bigger refactor. Flagged as dedicated PR 4b; tracked separately.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(services): dedupe monitor ids in Connect createNotification response
Cubic's P2 catch: the service dedupes `monitors` before the insert
(via `validateMonitorIds` in the services package), but the Connect
handler echoed `req.monitorIds` verbatim back in the response. For an
input like `["1", "1", "2"]` the DB stored `[1, 2]` while the response
claimed `["1", "1", "2"]` — caller state diverges from persistence.
Echo `Array.from(new Set(req.monitorIds))` instead so the response
matches what's actually stored.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(services): migrate page authoring (PR 7/N) (#2109)
* feat(services): migrate page (status-page authoring) onto service layer
Seventh domain. Migrates the 13 authoring procedures in
`pageRouter` onto `@openstatus/services/page`. Deliberately scoped to
authoring CRUD only:
- `statusPage.ts` — public viewer endpoints (subscribe / get / uptime /
report / verify / unsubscribe) are a separate surface that doesn't
use the authenticated `ServiceContext`; dedicated follow-up.
- Connect `apps/server/src/routes/rpc/services/status-page/**` — ~1500
lines with 18 methods (page CRUD + components + groups + subscribers
+ view). Too big for this PR; dedicated follow-up, same shape as the
Connect monitor deferral.
- `createPage` / `newPage` — full vs minimal create; both enforce the
`status-pages` plan cap and (for `createPage`) the per-access-type
plan gates (password-protection, email-domain-protection, ip-
restriction, no-index).
- `deletePage` — FK cascade clears components / groups / reports /
subscribers.
- `listPages` — batched enrichment with `statusReports`.
- `getPage` — enriched with `maintenances` / `pageComponents` /
`pageComponentGroups`.
- `getSlugAvailable` — pure check against `subdomainSafeList` + DB.
- `updatePageGeneral` — with slug-uniqueness re-check on change.
- `updatePageCustomDomain` — persists the DB change and returns the
previous domain so the caller can diff. Vercel add/remove stays at
the tRPC layer (external integration).
- `updatePagePasswordProtection` — re-applies the same plan gates
the `create` path uses.
- `updatePageAppearance`, `updatePageLinks`, `updatePageLocales`
(gated on `i18n` plan flag), `updatePageConfiguration`.
- Audit action emitted for every mutation.
All 13 procedures are thin wrappers. `delete` catches `NotFoundError`
for idempotency. `updateCustomDomain` orchestrates:
1. `getPage` (via service) to read the existing domain.
2. `addDomainToVercel` / `removeDomainFromVercel` as needed.
3. `updatePageCustomDomain` (via service) to persist.
- Biome scope adds `packages/api/src/router/page.ts`. The router
imports `insertPageSchema` via the services re-export
(`CreatePageInput`) so the db-import ban applies cleanly.
- Subpath export `@openstatus/services/page`.
- `__tests__/page.test.ts` covers `newPage` happy / reserved /
duplicate, `createPage` monitor attachment + cross-workspace monitor,
`updatePageGeneral` rename + duplicate-slug conflict + cross-workspace,
`updatePageLocales` plan gate, list / get / slug-available workspace
isolation, delete cross-workspace NotFoundError.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(services): migrate Connect status-page page CRUD onto services
Extends PR #2109 to cover the Connect RPC status-page handler's page
CRUD surface (create / get / list / update / delete), matching the
migration that landed for tRPC's `pageRouter`. The other 13 methods
(components, groups, subscribers, viewer) still read the db directly —
they're separate domains that'll need their own services in follow-ups.
- create / get / delete call into `@openstatus/services/page` and
preserve the granular Connect errors (`statusPageNotFoundError`,
`slugAlreadyExistsError`) by pre-checking before the service call or
catching `NotFoundError` → re-throwing the richer variant.
- list fetches via the service and paginates in-memory; status-page
quota is bounded per workspace so the extra enrichment is negligible.
- update loads the existing page via the service, then orchestrates the
per-section updates (`updatePageGeneral`, `updatePageLinks`,
`updatePageAppearance`, `updatePageCustomDomain`, `updatePageLocales`,
`updatePagePasswordProtection`) inside a shared transaction so a
partial failure can't leave the page half-updated. Each service's
internal `withTransaction` detects the pre-opened tx and skips
nesting.
- Proto-specific format validations (https icon URL, custom-domain
regex, IPv4 CIDR, email-domain shape) and the i18n PermissionDenied
path stay at the handler — they don't exist in the zod insert schema
and their error codes would change if deferred to the service.
- `Page` from the service parses `authEmailDomains` / `allowedIpRanges`
into arrays, while the converters (still used by the unmigrated
methods) expect the comma-joined string form. `serviceToConverterPage`
bridges the two shapes at the call sites that need it.
Biome scope deliberately unchanged: the file still imports from
`@openstatus/db` for the 13 legacy methods, so the override would
light up the whole file.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(services/page): address Cubic review on #2109
Four issues flagged across two Cubic reviews:
- `createPage` skipped `assertSlugAvailable`, so full-form creates
could bypass reserved/duplicate slug validation and either create a
duplicate or fail late on a DB constraint instead of the clean
`ConflictError`. Added the check alongside the existing quota gate.
- `createPage` passed `passwordProtected` / `allowedIpRanges` but not
`allowIndex` to `assertAccessTypeAllowed`, bypassing the `no-index`
plan gate on create. Now forwarded.
- `UpdatePagePasswordProtectionInput.allowedIpRanges` accepted arbitrary
strings. Mirrored the CIDR validation from `insertPageSchema` — bare
IPs get `/32` appended, everything pipes through `z.cidrv4()`.
- `updatePagePasswordProtection` wrote `authEmailDomains:
input.authEmailDomains?.join(",")`, which evaluates to `undefined`
when the caller clears the field. Drizzle treats `undefined` as
"skip this column" on `.set()`, so stale email domains survived an
access-type switch. Added the `?? null` fallback to match the
neighboring `allowedIpRanges` line. This fixes the Connect
`updateStatusPage` path where switching away from AUTHENTICATED sets
`nextAuthEmailDomains = undefined` expecting the column to clear.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(services): migrate workspace / user / invitation / api-key (PR 8/N) (#2110)
* feat(services): migrate workspace / user / invitation / api-key (PR 8/N)
Stacked on PR #2109. Eighth migration — four small domains consolidated into
one PR because each is narrow (roughly two to five procedures) and they
share no structural dependencies beyond already-migrated infrastructure.
**workspace** — `getWorkspace`, `getWorkspaceWithUsage` (pages + monitors +
notifications + page-components batched via drizzle relations),
`listWorkspaces` (takes `userId` explicitly since `list` runs across every
workspace the user has access to), `updateWorkspaceName`.
**user** — `getUser` (active, non-soft-deleted), `deleteAccount` (the paid-
plan guardrail stays; removes non-owned memberships, sessions, OAuth
accounts and blanks the PII columns inside a single tx).
**invitation** — `createInvitation` (plan gate counts pending invites
against the members cap so two outstanding invites can't both accept past
the limit), `deleteInvitation`, `listInvitations`, `getInvitationByToken`
(scoped by token **and** accepting email to prevent token-sharing),
`acceptInvitation` (stamps acceptedAt + inserts membership atomically).
**api-key** — `createApiKey` (returns plaintext token once), `revokeApiKey`
(workspace-scoped existence check inside the tx so concurrent revokes
resolve to a consistent NotFound rather than a silent no-op),
`listApiKeys` (replaces the legacy per-row `Promise.all` fan-out with a
single IN query for creator enrichment), `verifyApiKey` +
`updateApiKeyLastUsed` (no ctx required — the verify path runs before
workspace resolution and callers pass an optional `db` override).
All 14 procedures become thin `try { return await serviceFn(...) } catch
{ toTRPCError }` wrappers. Router shapes stay identical so the dashboard
needs no changes. Connect + Slack don't expose these domains today;
migrating their consumers is a follow-up.
Biome `noRestrictedImports` override adds the four router files. Subpath
exports `@openstatus/services/{workspace,user,invitation,api-key}` added
to the services package.
Deletes `packages/api/src/service/apiKey.ts` and its tests — fully
superseded by `packages/services/src/api-key/`. The auth middleware in
`apps/server` has its own inline apiKey verification and is unaffected.
- **`domain.ts`** — pure Vercel-API proxy with no DB usage; not part of
the migration surface. Stays as-is.
- **`packages/api/src/service/{import,telegram-updates}.ts`** — import
migration is PR 9; telegram-updates stays for a follow-up.
Per-domain `__tests__/*.test.ts` covers: workspace rename + audit, usage
counts, members cap hit on free plan, invitation token-mismatch rejection,
accept idempotency, api-key creation returning a bcrypt hash, list creator
enrichment, revoke NotFoundError on unknown ids, verifyApiKey happy / bad-
format / wrong-body paths, lastUsed debounce.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(services): address Cubic review on #2110
Four issues flagged on PR 8:
- **P1 — `invitation/accept.ts`**: the read-then-write pattern let two
concurrent accepts both pass the `isNull(acceptedAt)` check and race
through the membership insert. Replaced with a conditional UPDATE that
re-asserts `isNull(acceptedAt)` in the WHERE clause and checks
`.returning()` rowcount. The loser gets `ConflictError`, the tx aborts
before membership inserts run.
- **P2 — `api-key/create.ts`**: `createdById` was taken from input and
the router spliced in `ctx.user.id`. Since that column is attribution
data (who owns the key, who the audit row blames), trusting input
would let any caller forge ownership. Derived from `ctx.actor` via
`tryGetActorUserId`; actors without a resolvable user id (system /
webhook / unlinked api-key) now get `UnauthorizedError` instead of a
silent NULL write. `createdById` removed from the input schema.
- **P2 — `invitation/delete.ts`**: audit row was emitted even when the
DELETE matched zero rows (unknown id / wrong workspace). Switched to
`.returning({ id })` and short-circuit before the audit emit so the
log only reflects actual deletions.
- **P2 — `invitation/list.ts`**: the `if (!input.email)` →
`UnauthorizedError` branch in `getInvitationByToken` was unreachable
because `z.email()` already rejects empty / malformed emails at
`.parse()`. Removed the dead branch; the router keeps its own
pre-call check for `ctx.user.email`, so the transport-level
UnauthorizedError path is preserved.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(services): migrate import domain (PR 9/N) (#2111)
* feat(services): migrate import domain (PR 9/N)
Stacked on PR #2110. Ninth and final domain — lifts the ~1,000-line
`packages/api/src/service/import.ts` orchestrator into the services
package as its own `@openstatus/services/import` domain.
Split into focused files:
- **`schemas.ts`** — `PreviewImportInput` / `RunImportInput` zod. Provider
discriminator + per-provider page-id fields live here; options schema
is separately exported for callers that want to pre-validate.
- **`provider.ts`** — `createProvider` factory + `buildProviderConfig`
reshape helper, isolated from the orchestrator so adding a provider is
a one-file change.
- **`limits.ts`** — `addLimitWarnings` (shared by preview + run). Pure
mutation on the `ImportSummary` argument; no writes.
- **`utils.ts`** — `clampPeriodicity` + `computePhaseStatus` helpers.
- **`phase-writers.ts`** — the seven phase writers (page / component
groups / components / incidents / maintenances / monitors /
subscribers). Each takes a `DB` explicitly so callers can thread a
pre-opened tx; failing resources get `status: "failed"` with an error
string rather than throwing.
- **`preview.ts`** — dry-run only; validates credentials, runs the
provider with `dryRun: true`, emits warnings.
- **`run.ts`** — the orchestrator. Now owns the `pageId` ownership
check (previously duplicated in the tRPC router) and emits exactly
**one** `import.run` audit row regardless of outcome so partial /
failed runs still show up in the audit signal. Deliberately *not*
wrapped in `withTransaction` — imports can span minutes across dozens
of writes and the existing UX is phase-level recovery.
124 lines → 28 lines. The router is now a thin `previewImport` /
`runImport` wrapper; the input schemas and all validation live in the
service. The router-level `TRPCError`-throwing `pageId` ownership check
moved into `runImport` so non-tRPC callers (Slack / future) get the
same guard.
- Provider validation failure: `TRPCError("BAD_REQUEST")` →
`ValidationError` → `TRPCError("BAD_REQUEST")`. Net-same.
- Unknown / wrong-workspace `pageId`: `TRPCError("NOT_FOUND")` →
`NotFoundError` → `TRPCError("NOT_FOUND")`. Net-same.
- Unit tests for `addLimitWarnings` / `clampPeriodicity` /
`computePhaseStatus` move to `packages/services/src/import/__tests__/`.
- Router integration tests (`packages/api/src/router/import.test.ts`)
that previously called `previewImport` / `runImport` directly to
override workspace limits now route through `makeCaller(limitsOverride)`
with an explicit `provider: "statuspage"` field. This also fixes four
pre-existing TypeScript errors where those calls were missing the
(required) provider discriminator.
- Biome `noRestrictedImports` override adds `packages/api/src/router/import.ts`.
- Subpath export `@openstatus/services/import` added.
- `@openstatus/importers` added to services deps; services `tsconfig.json`
bumped to `moduleResolution: "bundler"` so the importers package-exports
map resolves (same setting `packages/api` already uses).
Deletes `packages/api/src/service/import.ts` (1042 lines) and its test
file (463 lines). Only `telegram-updates.ts` remains in
`packages/api/src/service/` — that's slated for a follow-up PR.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(services/import): per-resource audit + Cubic fixes on #2111
Two changes folded together:
Every phase writer now emits one `emitAudit` row per *created*
resource, matching what the domain services emit for normal CRUD:
| Phase | Audit action
| --- | ---
| page | `page.create`
| componentGroups | `page_component_group.create`
| components | `page_component.create`
| monitors | `monitor.create`
| incidents | `status_report.create` + `status_report.add_update` per update
| maintenances | `maintenance.create`
| subscribers | `page_subscriber.create`
Skipped resources don't emit (their original create audit already
exists); failed resources don't emit (nothing was written); link-table
rows (statusReportsToPageComponents etc.) don't emit (edges, not
entities). Metadata always carries `source: "import"` + `provider:
<name>` + `sourceId: <provider-id>` so the audit trail traces back to
the source system.
The rollup `import.run` audit still fires at the end — the per-resource
rows give forensic granularity, the run-level row gives "this bulk
operation happened" without scanning the full summary blob.
For the change, phase writers now take a shared `PhaseContext = { ctx,
tx, provider }` instead of `(db, workspaceId, limits)` — the orchestrator
builds one `PhaseContext` per run and threads it through, giving each
writer access to `ctx.actor` for audit attribution. `statusReportUpdate`
writes now use `.returning({ id })` so the per-update audit can
attribute the right row.
- **`run.ts:130`** — phases after `page` kept their provider-assigned
status when `targetPageId` was falsy but the user option wasn't
`false`. Replaced the narrow `else if (option === false)` branches
with a plain `else → phase.status = "skipped"`, matching what
`subscribers` already did.
- **`run.ts:147`** — when the `components` phase hit `remaining <= 0`,
the phase was marked `"failed"` but individual resource statuses were
left stale with no error string. Each resource is now marked
`"skipped"` with `"Skipped: component limit reached (N)"`, matching
`writeMonitorsPhase`. Phase-level status becomes `"skipped"` too
(was `"failed"` — failed implied a writer error, this is really a
plan-limit pre-check).
- **`provider.ts`** — both `createProvider` and `buildProviderConfig`
had a `default:` that silently ran the Statuspage adapter for any
unknown provider name, which would mask a typo by handing a non-
Statuspage api key to the wrong adapter. Replaced with exhaustive
`case "statuspage"` + `never`-typed default throw.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* chore(services): rename rpc/services → rpc/handlers (PR 10/N) (#2112)
The symbolic deliverable from the plan's "close the loop" PR. Renames
`apps/server/src/routes/rpc/services/` → `apps/server/src/routes/rpc/handlers/`
so the distinction between "the services layer" (owns business logic,
lives in `packages/services`) and "Connect transport handlers" (thin
proto → service → proto wrappers) is permanent and visible in the path.
Keeping the old name invites the next developer to "just add one small
thing" to a file under a `services/` folder months later; the rename
makes the layering explicit.
- `git mv` of the six domain subdirectories + their tests
(health / maintenance / monitor / notification / status-page /
status-report).
- `router.ts` import paths updated from `./services/*` to `./handlers/*`.
- Biome `overrides.include` paths updated to the new location.
- Added `apps/server/src/routes/rpc/handlers/health/**` to the scope —
the health handler has no db usage today; including it locks in that
invariant.
Rather than pretending the full "close the loop" deliverable is possible
today, the biome.jsonc comment now enumerates exactly what remains
unmigrated:
- `packages/api/src/router/statusPage.ts` — public viewer endpoints
under `publicProcedure`, no authed `ServiceContext`.
- `packages/api/src/router/{member,integration,monitorTag,
pageSubscriber,privateLocation,checker,feedback,stripe,tinybird,
email}.ts` — small domains not yet lifted.
- `apps/server/src/routes/rpc/handlers/monitor/**` — 6 jobType-specific
methods still on db.
- `apps/server/src/routes/rpc/handlers/status-page/**` — page CRUD is
migrated (PR 7), but components / groups / subscribers / viewer (13
methods) still import db, so the whole file stays out of scope.
- `apps/server/src/routes/v1/**` — the public HTTP API surface.
- `apps/server/src/routes/slack/**` except `interactions.ts` — tools,
handler, oauth, workspace-resolver still on db.
- `apps/server/src/routes/public/**` — public-facing HTTP routes.
Each of the above is its own PR-sized migration. The final consolidation
(broadening to `router/**` + dropping `@openstatus/db` from
`packages/api` and `apps/server`) is conditional on all of them
landing first.
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(services/import): use ctx workspaceId for page insert
`writePagePhase` was inserting with `data.workspaceId` — the value the
provider package round-tripped into resource data. Every other phase
writer (monitor / components / subscriber) already reads `workspaceId`
from `ctx.workspace.id`; this lines the page insert up with that
pattern. Defends against the (unlikely) case where a provider mapper
serialises the wrong workspace id into its output, since `ctx` is the
authoritative source.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(services): address Claude review findings on #2110
Six findings from Claude's review pass — five code/doc fixes, one
documentation-only note.
**P2 — `acceptInvitation` derives userId from `ctx.actor`.**
Was taking it from input: the email scoped *which* invitation could
be accepted, but not *who* the membership was inserted for. A caller
with the right token+email could insert a membership under an
arbitrary user id. Removed `userId` from `AcceptInvitationInput`;
derived from `tryGetActorUserId(ctx.actor)`, throws
`UnauthorizedError` for non-user actors. Mirrors the same pattern
applied to `createApiKey.createdById` in the Cubic pass. Router and
test updated accordingly.
**P2 — `getWorkspace` throws `NotFoundError` explicitly.**
`findFirst` + `selectWorkspaceSchema.parse(undefined)` was throwing
`ZodError` (→ `BAD_REQUEST`) instead of the `NotFoundError` shape
every other service uses. Unreachable in practice (ctx.workspace is
resolved upstream) but the error shape was the only outlier;
consistency matters for callers pattern-matching on error codes.
**P3 — `listApiKeys` filters null `createdById` before the IN query.**
The new `createApiKey` path enforces a non-null creator, but legacy
rows may have null. SQL's `x IN (NULL)` is `UNKNOWN` — technically
safe — but drizzle types model the array as `number[]`. Filtering
upfront keeps the types honest and sidesteps any future surprise.
**P3 — `deleteInvitation` guards `acceptedAt IS NULL`.**
The WHERE previously allowed hard-deleting *accepted* invitations,
wiping the "user was invited on X" breadcrumb. Added the
`isNull(acceptedAt)` guard + doc comment explaining the audit-trail
preservation intent.
**Doc-only — `deleteAccount` orphan comment.**
Non-owner memberships are removed, but owner memberships + owned
workspaces survive. Matches legacy behavior. Added a scope-note
docblock flagging that workspace cleanup is explicitly out of scope
(belongs to a future admin / scheduled job).
**Doc-only — `createInvitation` role comment.**
The invite insert lets `role` fall through to the schema default
(`member`). Matches legacy (which also only picked `email`).
Comment added so the absence reads as deliberate rather than
overlooked.
Minor — the concurrent-accept race test is covered by the conditional
UPDATE + `ConflictError` path from the earlier P1 fix; mocking it
reliably against SQLite is noisy and not worth the test complexity.
Documented in the related code comment.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(services): address Claude re-review findings on #2110
Four issues surfaced after the first round of fixes on this PR:
**P2 — `listApiKeys` crashes on all-legacy keys.**
After the null filter added in the previous commit, workspaces whose
keys all pre-date the services migration (every `createdById` null)
end up with `creatorIds === []`. Drizzle throws "At least one value
must be provided" on an empty `inArray`, taking the whole endpoint
down. Added an early return that maps `createdBy: undefined` when
there are no non-null creator ids to look up.
**P2 — `getWorkspaceWithUsage` ZodError on missing row.**
Same `findFirst` + `selectWorkspaceSchema.parse(result)` pattern as
`getWorkspace`, but without the `NotFoundError` guard that got added
in the earlier pass. Added the guard. Also cleaned up the usage
block — no longer needs optional chaining once the narrowing fires.
**P2 — `deleteAccount` took `userId` from input.**
Completing the `createApiKey` / `acceptInvitation` pattern: account
deletion must target `ctx.actor`, never an arbitrary id. Dropped
`userId` from `DeleteAccountInput` (now an empty forward-compat
shape), derived inside the service via `tryGetActorUserId`, throws
`UnauthorizedError` for non-user actors. Router updated to stop
passing it.
**P3 — `createInvitation` dev-token log could leak in tests.**
Tightened the comment around the `process.env.NODE_ENV === "development"`
guard to flag that strict equality is load-bearing — bun:test sets
`NODE_ENV=test` and CI leaves it undefined, both of which correctly
skip the log. No behavior change, just a clearer contract so the
next reader doesn't loosen it.
Cubic's two findings on this review pass point at `packages/api/src/
router/import.ts` and `packages/services/src/import/limits.ts` — both
live in the next PR up the stack (#2111 / feat/services-import) and
will be addressed there.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(services): address latest Cubic pass on #2110
Four findings from the third Cubic review (now that #2111's import
domain is included in the #2110 diff via the stack):
**P2 — biome.jsonc notification handler scope.**
Only `notification/index.ts` was in the `noRestrictedImports` override.
Sibling files (`errors.ts`, `test-providers.ts`) were outside the
migration guard, so new db imports could land in them without the
lint failing. Broadened to `notification/**` and moved the two files
that *legitimately* still read db (`limits.ts` querying workspace
quotas, `converters.ts` needing db enum shapes for proto round-trip)
into the `ignore` list. Future siblings are enforced by default
rather than silently slipping through.
**P2 — `clampPeriodicity` unknown values returned too fast.**
`PERIODICITY_ORDER.indexOf("unknown") === -1` → `Math.max(-1, 0) === 0`
→ walk started at `"30s"` (the fastest tier). Could return an
interval faster than requested, violating the
"never-faster-than-requested" invariant. Short-circuits now to the
slowest allowed tier when the requested value isn't a known
periodicity. Added unit tests covering the unknown-value + empty-
allowed fallback paths.
**P2 — component/monitor limit warnings counted total resources, not
quota-consuming inserts.**
If the import contained 4 components and 3 already existed (would be
skipped as duplicates), the warning claimed `"Only X of 4 can be
imported"` — but actually zero quota would be consumed by the 3
skips, so the real new-creation count might fit entirely. Reworded
to `"Only N new components may be created … some of the M in the
import may already exist and be skipped"`. Same treatment for the
monitors warning. Preview stays DB-light (no per-resource existence
checks); the warning now honestly conveys worst-case without
misleading users about what will actually happen. Test assertions
updated to match the new wording with substring matches that aren't
tied to the exact fraction.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(services/page): address Claude review on #2109
Six items from Claude's review, going with the calls I leaned toward
in the question-back:
**P2 — tRPC `updateCustomDomain` wasteful `getPage` read.**
Was calling `getPage(id)` (fires 3 batched relation queries:
maintenances + components + groups) just to grab `customDomain`
before the Vercel add/remove calls. Added a narrow
`getPageCustomDomain` service helper — single indexed lookup,
workspace-scoped, returns the string directly. Router swapped over.
Service-layer authority preserved; no db reads leak into the router.
**P2 — Connect `updateStatusPage` slug-race code drift.**
Handler pre-checks slug to surface `slugAlreadyExistsError`
(`Code.AlreadyExists`). The `updatePageGeneral` service call
re-validates via `assertSlugAvailable` → `ConflictError` →
`Code.InvalidArgument` in the race where two callers both clear the
pre-check. Wrapped the call in `try/catch (ConflictError)` and
rethrow as `slugAlreadyExistsError(req.slug)` so gRPC clients keying
on the code get a consistent `AlreadyExists` whether they lose at
the pre-check or at the inner tx.
**P2 — Connect `createStatusPage` / `updateStatusPage` customDomain
without Vercel sync.** Pre-existing behaviour (the direct-db handler
had the same gap). Added a top-of-impl comment so it doesn't go
unnoticed — the fix is a shared transport-layer helper the Connect
handlers can reuse, out of scope for this migration PR to keep the
behavioural blast radius small for external API consumers.
**P3 — double cast `row as unknown as Page` in `create.ts`.**
The drizzle insert-returning type and the `Page` type diverge on
`authEmailDomains` / `allowedIpRanges` (raw comma-joined string vs
parsed `string[]`). Replaced the double casts with
`selectPageSchema.parse(row)` which normalises the row into the shape
callers expect. Cast-drift is now impossible to introduce silently.
**P3 — `void ConflictError;` workaround.**
Import was unused in `create.ts`; the `void` line was silencing the
unused-import warning rather than fixing the cause. Removed both.
**P3 — deprecated `passwordProtected` column.**
Added a doc block on `updatePagePasswordProtection` flagging that
the deprecated boolean column is intentionally not written here (the
v1 REST read path derives it from `accessType` via
`normalizePasswordProtected`). Prevents a future reader from
mistaking the omission for an oversight and writing two sources of
truth for the same signal.
Test coverage for the 5 untested update services
(`updatePagePasswordProtection`, `updatePageCustomDomain`,
`updatePageAppearance`, `updatePageLinks`, `updatePageConfiguration`)
deferred to a follow-up per Claude's "not blocking" marker — the
failing-edge behaviour is the critical bit, and
`updatePagePasswordProtection` already has indirect coverage through
the Connect handler tests on this branch.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(services): address Claude review on #2108
Four items from Claude's review of the Connect notification handler
backfill:
**P3 — `protoDataToServiceInput` swallowed parse failures.**
`try { JSON.parse } catch { return {} }` was hiding any malformed
output from `protoDataToDb` (which would be a programmer error, not
user-input) behind a generic empty-object fallback. The downstream
`validateNotificationData` then failed with a far less specific
error. Let the throw propagate — `toConnectError` maps it to
`Code.Internal`, which is the signal we want for "the helper itself
misbehaved."
**P3 — `createNotification` response approximated the monitor IDs.**
Was echoing `Array.from(new Set(req.monitorIds))` on the happy path
(correct, since the service validates + throws on invalid) but the
approximation diverged from `updateNotification`'s re-fetch pattern.
Now re-fetches via `getNotification` after create so the response
reflects what's actually in the DB — one extra IN query per create,
eliminates the approximation entirely, makes both handlers
structurally identical.
**P3 — `sendTestNotification` bypassed `toConnectError`.**
Only handler in the impl without a `try { … } catch { toConnectError
}` wrap, so any thrown `ServiceError` / `ZodError` from
`test-providers.ts` fell through to the interceptor's generic catch
and surfaced with a less precise gRPC status. Wrapped for symmetry.
**P3 — `JSON.parse(existing.data)` null-unsafe.**
Drizzle infers `notification.data` as `string | null` (the column has
`default("{}")` but no `.notNull()`). A legacy row with `NULL` in the
column would crash `updateNotification` with `SyntaxError` during
the partial-update read-modify-write. Added `?? "{}"` fallback and a
comment pointing at the schema.
Cubic's single finding from the earlier pass (dedupe of
`req.monitorIds` in the create response) was already applied in
`b69ad13` and has now been superseded by the re-fetch above.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(services): address latest Cubic pass on #2108
Five findings from Cubic's second review cycle on this PR, all on
files that entered this branch via the #2109 (status-page) and #2111
(import) squash-merges stacked on top. Fixing here so the cumulative
state reaching main is clean.
**P1 — `page/create.ts` double-encoded JSON configuration.**
`page.configuration` is a drizzle `text("…", { mode: "json" })`
column — drizzle serialises objects automatically. Calling
`JSON.stringify(configuration)` first stored a raw JSON string in
the column, breaking any downstream read that expects an object
(e.g. the appearance merge at `update.ts:185`). Dropped the wrap;
drizzle handles it.
**P2 — `page/schemas.ts` slug + customDomain validation weaker than
insert schema.**
`NewPageInput.slug`, `GetSlugAvailableInput.slug`, and
`UpdatePageGeneralInput.slug` were `z.string().toLowerCase()` — no
regex, no min-length. `UpdatePageCustomDomainInput.customDomain`
was `z.string().toLowerCase()` — no format check. Meant the service
would accept malformed slugs / URLs that `createPage` would then
reject via `insertPageSchema`, or — worse — that `getSlugAvailable`
would confidently return "available" for garbage. Exported the
canonical `slugSchema` + `customDomainSchema` from
`@openstatus/db/src/schema/pages/validation` and reused them across
all four service inputs; db validation is now the single source of
truth for page slug/domain shape.
**P2 — `api/router/import.ts` nullish → optional contract narrowing.**
The service's `PreviewImportInput`/`RunImportInput` used `.optional()`
for the three provider page-id fields, which dropped the `null`
acceptance the legacy router had via `.nullish()`. Existing clients
sending `null` would have started hitting `Invalid input` errors
after the import migration landed. Added a `nullishString` transform
in the service schema that accepts `string | null | undefined` and
normalises to `string | undefined` before it reaches
`buildProviderConfig` — callers keep the broader contract, service
internals stay ignorant of `null`.
**P2 — `page/update.ts` empty array stored "" not null.**
`authEmailDomains?.join(",") ?? null` coerces `null`/`undefined` to
`null`, but `[].join(",")` returns `""` (empty string) which `??`
treats as a value. Callers sending `authEmailDomains: []` to clear
the column were persisting the empty string instead of nulling it —
misleading "present but blank" state. Switched to `|| null` on both
array-join outputs (`authEmailDomains` + `allowedIpRanges`) so the
three clearing inputs — `undefined`, `null`, `[]` — all land on DB
`NULL` while real non-empty joins pass through unchanged.
Test fixtures already use slugs ≥ 3 chars that match the regex, so
the tightened validation doesn't break any existing assertions.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(services/page-component): address Cubic + Claude review on #2107
Three fixes + one test, addressing both the original Cubic finding
and Claude's re-review pass.
**P2 — Discriminated union for `componentInput`.**
The flat `z.object` with `type: z.enum(["monitor", "static"])` and
optional `monitorId` let callers submit a "monitor" component with
no monitor id, or a "static" one with a monitor id attached. The DB
catches it with a `CHECK` constraint, but that surfaces as an opaque
SQLite CHECK failure instead of a clean `ZodError` at the service
boundary. Replaced with a `z.discriminatedUnion("type", [...])` that
requires `monitorId` on the "monitor" arm and omits it on the
"static" arm.
Fallout in `update-order.ts`: `c.monitorId` no longer exists on the
"static" arm after narrowing, so the spreads now use
`monitorId: c.type === "monitor" ? c.monitorId : null`. The defensive
`&& c.monitorId` guards on the already-narrowed monitor branches are
gone (TypeScript enforces the invariant the DB was catching late).
**P2 — Sequential group insert instead of bulk `.returning()`.**
The bulk insert relied on drizzle/SQLite returning rows in the same
order they were inserted, so `newGroups[i]` could line up with
`input.groups[i]` when mapping components to their groups. True on
Turso today, but an implicit coupling — any driver change, batch
split, or upstream sort could silently reorder rows and land
components in the wrong group with no error signal. Switched to a
loop that captures each group id before moving on; the set size is
bounded by the status-page component-group plan cap so the extra
round trips are a rounding error.
**Nit — removed dead `hasStaticComponentsInInput` guard.**
Both the "input has static components but none carry ids" and "input
has no static components at all" branches collapsed to the same
"drop all existing static components" action, so the outer
`hasStaticComponentsInInput` conditional was doing no work. Dropped
the variable and the nested branch.
**Test — upsert idempotency.**
The `onConflictDoUpdate` on `(pageId, monitorId)` was the riskiest
untested path — a regression would silently insert duplicate rows on
every re-invocation. Added a test that calls
`updatePageComponentOrder` twice on the same page with the same
`monitorId`, then asserts there's exactly one matching row and the
second call's values won.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(services): address latest Cubic pass on #2107 + unblock build
Eight Cubic findings from the second review plus one dashboard build
break from my earlier discriminated-union change.
**Build — router shape diverged from service discriminated union.**
`packages/api/src/router/pageComponent.ts` kept its own flat
`z.object({...})` input schema with `type: z.enum(["monitor",
"static"])` and `monitorId: z.number().nullish()`. After the service
switched to `z.discriminatedUnion("type", [...])`, TS couldn't
reconcile the two — dashboard build failed. Replaced the local
schema with the service's exported `UpdatePageComponentOrderInput`
so both layers share the canonical shape.
**P1 — page router: validate customDomain before Vercel call.**
The router input was `z.string().toLowerCase()` (no format check) and
the service's `customDomainSchema` only fired inside
`updatePageCustomDomain`, *after* the Vercel add/remove mutations.
A malformed domain could be added to Vercel, then rejected by the
service, leaving Vercel/db state drifted. Switched the router input
to the service's `UpdatePageCustomDomainInput` so format validation
runs at tRPC input parsing, before any Vercel call.
**P1 — `listApiKeys` leaked `hashedToken`.**
`SELECT *` returned every column including the bcrypt hash of each
key's one-time token, which has no business appearing in a list
response. Replaced with an explicit column select that omits
`hashedToken`. New `PublicApiKey` type (`Omit<ApiKey,
"hashedToken">`) is the return shape; exported from the barrel.
**P2 — `acceptInvitation` eager workspace load + second fetch.**
The initial `findFirst` already loaded the workspace via `with: {
workspace: true }`, but the return value re-fetched it by id. Use
the joined value directly — one round-trip instead of two, and
eliminates the read-skew window where a just-renamed workspace
could appear with a different name in each fetch.
**P2 — `import.run` audit entityId 0.**
`entityId: targetPageId ?? 0` wrote a ghost `page 0` reference to
the audit trail when the page phase failed before producing an id.
Entity attribution now falls back to the workspace (`entityType:
"workspace"`, `entityId: ctx.workspace.id`) when no target page is
in play — real rollback signal, no phantom foreign key.
**P2 — `page-components` limit scoped per-page, not workspace.**
`page-components` is a workspace-wide cap (see
`page-component/update-order.ts` — counts every component across
every page). The import preview and run's component check were
scoping the existing count to `targetPageId`, which understated
pressure and would let imports push past the cap at write time.
Both sites now count workspace-wide.
**P2 — `writeIncidentsPhase` lacked idempotency.**
Every other phase writer checks for an existing row before
inserting (page by slug, monitor by url, component by name,
subscriber by email); `writeIncidentsPhase` inserted
unconditionally. A re-run would duplicate status reports on every
pass. Added an existence check by `(title, pageId, workspaceId)`
matching the convention.
**P2 — `writeMaintenancesPhase` lacked idempotency.**
Same pattern. Added a check by `(title, pageId, from, to,
workspaceId)` — the `from/to` pair is load-bearing because
maintenance titles recur ("DB upgrade") across unrelated windows.
**P2 — `writeComponentsPhase` silent monitor→static fallback…

Summary
Stacked on #2108. Seventh domain. Migrates the 13 authoring procedures of
pageRouterand the 5 page-CRUD methods of the Connectstatus-pagehandler onto@openstatus/services/page. The public viewer domain (statusPage.ts) and the remaining 13 Connect methods (components / groups / subscribers / viewer) still warrant dedicated follow-ups.What lands
Services (
packages/services/src/page/)createPage/newPage— full vs minimal create. Both enforce thestatus-pagesplan cap;createPagealso re-applies the per-access-type plan gates (password-protection / email-domain / ip-restriction / no-index).deletePage— FK cascade takes care of components / groups / reports / subscribers. tRPC wrapper swallowsNotFoundErrorfor idempotency.listPages— returns pages with theirstatusReports(batched; scoped to workspace).getPage— returns page +maintenances/pageComponents/pageComponentGroups(matches the old zod shape exactly).getSlugAvailable— pure check againstsubdomainSafeList+ DB.updatePageGeneral— re-checks slug uniqueness only when the slug actually changes.updatePageCustomDomain— persists the DB change and returns the previous domain so the Vercel-diff logic at the transport layer can avoid a separate read.updatePagePasswordProtection— same gates ascreate.updatePageAppearance,updatePageLinks,updatePageLocales(plan-gated oni18n),updatePageConfiguration.tRPC (
packages/api/src/router/page.ts)All 13 procedures are thin wrappers. The
updateCustomDomainprocedure orchestrates:getPage(via service) reads the current domain.addDomainToVercel/removeDomainFromVercel— transport-layer external calls, stay at tRPC (same philosophy astestHttp/testTcp/testDnsfor monitor).updatePageCustomDomain(via service) persists the change.Connect (
apps/server/src/routes/rpc/services/status-page/index.ts)Page CRUD methods —
createStatusPage/getStatusPage/listStatusPages/updateStatusPage/deleteStatusPage— now call into the services. The other 13 methods (components / groups / subscribers / viewer) still read the DB directly; they're separate domains for follow-up PRs.updateStatusPageloads the existing page viagetPage, then orchestrates per-section updates (updatePageGeneral,updatePageLinks,updatePageAppearance,updatePageCustomDomain,updatePageLocales,updatePagePasswordProtection) inside a single transaction (withTransaction+ sharedctx.db = tx), so partial failures can't leave the page half-updated.statusPageNotFoundError(catchesNotFoundError→ rethrows with page-id metadata) andslugAlreadyExistsError(pre-check before the service call —AlreadyExists+ metadata instead of the service's genericConflictError→InvalidArgument).PermissionDeniedpath stay at the handler — they don't exist in the zod insert schema and the error codes would change if deferred to the service.listStatusPagesfetches via the service and paginates in-memory; status-page quota is bounded per workspace so the enrichment overhead is negligible for this migration.serviceToConverterPagebridges the shape mismatch between the service's parsedPage(arrays forauthEmailDomains/allowedIpRanges) and the converter's raw-DB shape (comma-joined strings, still used by the 13 unmigrated methods).Enforcement
packages/api/src/router/page.ts. Router imports the drizzle insert schema via the services re-export (CreatePageInput) so the db-import ban applies cleanly.status-page/index.tsis not added to the Biome scope yet — the file still imports from@openstatus/dbfor the 13 legacy methods, so the override would light up the whole file. Will land in the follow-up that migrates the remaining methods.@openstatus/services/page.@openstatus/localesto the services deps (used byUpdatePageLocalesInput).Tests
__tests__/page.test.ts:newPagehappy / reserved slug / duplicate slugcreatePagemonitor attachment + cross-workspace monitor →ForbiddenErrorupdatePageGeneralrename, duplicate-slugConflictError, cross-workspaceNotFoundErrorupdatePageLocalesplan-gateLimitExceededErroron free plan (noi18n)list/get/getSlugAvailableworkspace isolation + reserved handlingdeletecross-workspaceNotFoundErrorDeliberately out of scope
packages/api/src/router/statusPage.ts— ~1280 lines of public viewer endpoints (get,getLight,getMaintenance,getUptime,getReport,getMonitors,subscribe,unsubscribe,verifyPassword, …). They'republicProcedures that don't run through an authenticatedServiceContext; either a dedicated public-ctx surface in services or a separate subscription service is needed. Follow-up PR.status-page/index.ts— components / groups / subscribers / viewer. Same pattern as the Connect monitor deferral; follow-up with its own scope.Test plan
/status-pages/create,/status-pages/[id]/settings(general / custom domain / appearance / links / locales / password),/onboardingpage creationpage.createfrom onboarding still attaches monitors as componentsStatusPageService.*CRUD smoke (buf studio or equivalent): create → get → list → update (each section) → delete🤖 Generated with Claude Code