feat: new-user onboarding overhaul + realtime-signal & mobile fixes#362
Conversation
Reduce the initial hurdle new users face around the catalog and health checks, plus follow-up fixes surfaced during review. Onboarding: - Atomic healthcheck.createAndAssign RPC; AI propose tool prefers HTTP over a script and creates + assigns in one step; gated onboarding system-prompt. - catalog.createEnvironment / setSystemEnvironments AI tools + listEnvironments. - FirstCheckWizard (new @checkstack/ui Stepper): system + HTTP check + assignment in one guided flow, from the Health Checks empty state and a "Quick start" header button; new-or-existing system; environment nudges. - Docs: enable Mermaid, add architecture/onboarding diagrams, clarify assignments and environments. AI chat: - askOperator tool: clickable answer chips instead of plaintext questions. Fixes: - catalog + healthcheck now broadcast realtime signals on mutations, so out-of-band writes (AI, GitOps, other pods/users) refresh open clients - fixes a stale-cache 404 on the system page after AI-created systems. - Mobile: nav drawer bound to the dynamic viewport (scrolls to the last item); navbar wordmark hidden on small screens. - associateSystem no longer drops the per-assignment notificationPolicy. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_011v4Nr8YnMGMRziVhgqTc1z
🦋 Changeset detectedLatest commit: 122ee9a The changes in this PR will be included in the next version bump. This PR includes changesets to release 142 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
Covers the APP_DOC_SLUGS.environments addition flagged by the changeset coverage check. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_011v4Nr8YnMGMRziVhgqTc1z
❌ PR Checks Failed
❌ Typecheck ErrorsHow to fix: Run ❌ Test FailuresHow to fix: Run ❌ Security Audit FailuresHow to fix: Only fixable findings (a patched version exists) fail the build; unfixed findings are surfaced as warnings, not gated. Run ❌ E2E FailuresHow to fix: These are the Playwright end-to-end tests. Reproduce locally with @enyineer The above code quality issues were found in this PR. Please fix them before merging. |
CI caught two issues from the docs and onboarding changes: - The bundled AI docs index (core/ai-backend/src/generated/docs-index.ts) was stale after the docs markdown changes; regenerate it (fixes the Typecheck docs-index check and the drift-guard test). - The Health Checks empty-state description started with "No health checks yet", colliding with ListEmptyState's title of the same text (strict-mode violation in onboarding.empty.spec.ts). Reword the description and update the e2e spec to the new copy + the new guided-setup buttons. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_011v4Nr8YnMGMRziVhgqTc1z
❌ PR Checks Failed
❌ Security Audit FailuresHow to fix: Only fixable findings (a patched version exists) fail the build; unfixed findings are surfaced as warnings, not gated. Run @enyineer The above code quality issues were found in this PR. Please fix them before merging. |
Trivy flagged 6 fixable CVEs in pre-existing build-tooling deps (astro, vite, js-yaml, devalue). bun only honors flat overrides, so the conflicting majors (frontend vite@8 vs astro's vite@7; changesets read-yaml-file's js-yaml@3 vs the v4 consumers) can't be pinned surgically. A fresh resolve against the existing semver ranges bumps each consumer to its latest in-range version: - astro 6.4.2 -> 6.4.8 (CVE-2026-54298, CVE-2026-54299) - vite 7.3.2 -> 7.3.6 (CVE-2026-53632, CVE-2026-53571; frontend vite@8 kept) - js-yaml 4.1.1 -> 4.2.0 (CVE-2026-53550; read-yaml-file's js-yaml@3 kept) - devalue 5.8.0 -> 5.8.1 (CVE-2026-42570) Lockfile-only change; package.json ranges unchanged. typecheck, lint, and the docs build pass locally. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_011v4Nr8YnMGMRziVhgqTc1z
❌ PR Checks Failed
@enyineer The above code quality issues were found in this PR. Please fix them before merging. |
The full lockfile re-resolve cleared the 6 build-dep CVEs but bumped the frontend's vite/rolldown (and at least one further transitive dep) and broke the frontend build (E2E Build). Pinning vite to 8.0.16 and rolldown to 1.0.3 did not restore it, and the bundler swallows the underlying error messages, so a clean fix isn't tractable inside this PR. Revert bun.lock to the working state; the pre-existing CVEs are better handled in a dedicated, carefully-tested dependency PR. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_011v4Nr8YnMGMRziVhgqTc1z
❌ PR Checks Failed
❌ Security Audit FailuresHow to fix: Only fixable findings (a patched version exists) fail the build; unfixed findings are surfaced as warnings, not gated. Run @enyineer The above code quality issues were found in this PR. Automated fixes have not resolved them after 3 attempts. Manual intervention is required. |
Why
A new user tried to create their first system + health check via the AI assistant and hit several walls: the assistant authored a script check instead of HTTP and couldn't finish the job (a created check was left unassigned, so it never ran), the docs didn't explain assignments or environments, and there were no real architecture diagrams. This PR drastically lowers that first-run hurdle, plus the follow-up fixes surfaced while reviewing it.
Onboarding
healthcheck.createAndAssignRPC (transaction): creates a config + assignment and starts it immediately, so the common 1-1 case can never leave a dormant check.healthcheck.proposenow prefers the HTTP strategy for a URL (never a script unless asked) and, givenassignToSystemId, creates + assigns + starts in one approval. A gated onboarding system-prompt playbook steers HTTP, ask-before-guessing, and one-system-many-environments.catalog.createEnvironment,catalog.setSystemEnvironments,catalog.listEnvironmentsprojection.FirstCheckWizard(new@checkstack/uiStepper): name/pick a system → paste a URL → review → it creates the system, an HTTP check (with astatusCode == 200assertion), and the assignment in one guided flow. Reachable from the Health Checks empty state and an always-on "Quick start" header button; supports new or existing systems; inline environment nudges (linked to docs).astro-mermaid, client-side); add architecture/onboarding diagrams; add a "what is an assignment / why a check needs one" callout + an Assignments concept section; stop endorsing one-system-per-environment.AI chat
askOperatortool: the assistant asks discrete-choice questions as clickable chips (plus a free-text box) instead of a plaintext list; clicking sends the answer. Built on the existing confirm-card data-card pattern.Fixes
SignalAutoInvalidator. This fixes a stale-cache 404 on the system page after an AI-created system. (incident/maintenance/automation/slo/dependency already did this.)associateSystemno longer silently drops the per-assignmentnotificationPolicy(regression test added).Sheet) was bound to the layout viewport so its bottom hid behind the browser chrome — nowh-[100dvh], so it scrolls to the last item; the "Checkstack" wordmark is hidden belowsmto de-clutter the navbar.Tests & verification
bun run typecheckandbun run lintclean.astro build).Note
The wizard UI and the two mobile changes compile, lint, and unit-test, but the live UI click-through (and the
askOperatormodel round-trip) need a running app + a configured model to eyeball.🤖 Generated with Claude Code
https://claude.ai/code/session_011v4Nr8YnMGMRziVhgqTc1z