Skip to content

Post-cutover plan: chat.recoupable.com → /api/chat/workflow #1747

@sweetmantech

Description

@sweetmantech

Tracking issue for the chat.recoupable.com → api.recoupable.com/api/chat/workflow cutover. Mirrors the open-agents-side cutover (open-agents#43, merged) that took sandbox.recoupable.com to the same workflow endpoint.

Goal

Migrate chat.recoupable.com off its legacy /api/chat endpoint (handleChatStream-backed) onto api's durable workflow endpoint /api/chat/workflow (runAgentWorkflow-backed). After cutover, chat.recoupable.com and sandbox.recoupable.com share the same chat infrastructure: sessions / chats / chat_messages tables, runAgentWorkflow, ensurePersonalRepo for clone URL construction.

Status

Done

  • Vertical-slice cutover for new chatschat#1748. Merged to test as d8c007f0 on 2026-05-27.
  • Workflow path verified end-to-end on preview: POST /api/sessionsPOST /api/sandboxPOST /api/chat/workflow (200 streaming). URL navigates to /chat/<api.chat.id>. React StrictMode safe: exactly one session + one sandbox per mount.
  • Tool-output rendering — fixed in chat#1751. Root cause: AI SDK version skew — chat was on ai@6.0.0-beta.99, whose tool-output-available strict schema lacks providerMetadata; api emits that field. Bumped to ai@6.0.165 / @ai-sdk/react@3.0.167.
  • Org-session path verified end-to-end — switching the active org changes the new-chat session's organizationId, resolves cloneUrl to recoupable/<orgId>, and provisions an org-scoped sandbox. Verified on test across Personal / Myco WTF / Rostrum Pacific (0 / 19 / 43 artists in their sandbox filesystems).
  • Phase 2 — data backfill done. Migrated 23,252 rooms → sessions+chats and 46,212 well-formed memories → chat_messages, IDs preserved (chats.id == rooms.id, sessions.id = uuidv5(room.id, …)). 347 deleted-account rooms intentionally skipped. Reconciled per-room against source. Script in api#623; chat_messages.parts shape fix in api#627 with one-time SQL UPDATE on prod (46,192 rows reconstructed in-place). A final idempotent re-run is owed before the cutover lands on main to catch stragglers.
  • Workflow read receiptsPOST /api/sessions/{sid}/chats/{cid}/read (api#624) writes chat_reads.last_read_at, completing the hasUnread read+write loop.
  • Chat-side canonical route + history reader (chat#1752) — adds /sessions/[sid]/chats/[cid] route; getChatMessages now hits GET /api/sessions/{sid}/chats/{cid} (which reads chat_messages), with useMessageLoader threading sessionId through.
  • Canonical URL persists on send (chat#1757) — silentlyUpdateUrl emits /sessions/{sid}/chats/{cid} instead of /chat/{id}. Fixes the URL-bounce-on-send regression feat(chat): add /sessions/[sid]/chats/[cid] route + session-scoped loader #1752 introduced alone. Verified on preview from both /chat (NewChatBootstrap) and existing canonical-URL chats.
  • sessions.artist_id column + inline backfill (database#27) — schema change + inline backfill via the same uuidv5(NAMESPACE, room.id) derivation the api script uses. 17,985 of 23,519 sessions populated; FK validated; zero orphans; sessions_artist_id_idx exists. Verified pre-merge via transactional dry-run on prod.
  • API docs reflect new artistId shape (docs#228) — CreateSessionRequest.artistId (optional uuid) and Session.artistId (nullable uuid) added to the shared OpenAPI schema; surfaces on every Session response page (POST /api/sessions, GET /api/sessions/{id}, PATCH /api/sessions/{id}).
  • POST /api/sessions accepts + persists artistId (api#628) — validateCreateSessionBody, buildSessionInsertRow, createSessionHandler, toSessionResponse, and DB types updated. Verified end-to-end on preview against the merged docs schema: all three Session-returning endpoints (POST/GET/PATCH) match the documented shape exactly with zero drift.
  • API docs reflect new GET /api/chats shape + working filter (docs#227) — ChatRoom adds artistId (nullable uuid, from sessions.artist_id); artist_account_id query param redocumented as a real filter (no longer "no-op").
  • GET /api/chats artist projection + filter + Bearer-admin scope fix (api#626, merged as 94821f69) — response shape now matches docs#227 ({id, title, accountId, sessionId, artistId, updatedAt}); artist_account_id filter wired through selectChatsWithSessions; two cubic P1 fixes folded in (Recoup-admin scope now membership-based via account_organization_ids so Bearer-authed admins get the same scope as x-api-key org admins; 500 catch no longer leaks raw exception messages); extracted lib/organizations/isRecoupAdmin.ts per DRY review. Verified end-to-end on preview with both x-api-key and Bearer — every wire field matches the docs schema, no drift.
  • scripts/backfill/migrateRoom.ts straggler-aware artist_id (api#629, merged as a8c91b0e) — insertSession row now carries artist_id: room.artist_id so the owed pre-promotion straggler re-run no longer silently produces artist_id = NULL sessions for rooms that had one. Completes the write-path work begun in api#628. Verified end-to-end on prod: pre-fix run on room 8c7f8a9c produced a session with artist_id = NULL (reproducing the bug); post-fix run on room 83e25ef9 produced a session with the correct artist_id. Total existing drift across 23.5k sessions: 1 row (our test orphan), patched via single-row UPDATE — the bulk fleet was already correct because database#27's inline backfill covered everything migrated before the column existed.
  • HomePage / migration to NewChatBootstrap (chat#1760, merged as 6d97bb19) — / now mounts <NewChatBootstrap> instead of <Chat> so it provisions a session + sandbox before render and routes sends through POST /api/chat/workflow (not legacy POST /api/chat). useAutoLogin hoisted into NewChatBootstrap; LegacyAutoLogin wrapper added in chat.tsx for the /chat/[roomId] route (scheduled for deletion alongside that route in step 4). Verified live: bare test agent reported "I don't have a tool for file listing"; this PR's preview successfully ran the bash tool and listed README.md. Also re-validates chat#1757's canonical URL behavior from the / surface (URL rewrites to /sessions/{sid}/chats/{cid} on first send).
  • Exclude archived sessions from GET /api/chats (api#630, merged as 0f8bdf19) — selectChatsWithSessions now applies .neq("session.status", "archived") unconditionally, so chats whose owning session has been archived (via PATCH /api/sessions/{sid} { status: "archived" }) disappear from the listing. Covers both the REST endpoint and the MCP get_chats tool through the same chokepoint. Wire shape unchanged — same 6 ChatRoom fields, just fewer rows. Verified live: prod test returned 14 chats including the archived "Introduction to Coding"; preview returned 13 with that row gone; unarchive → 14 (row re-appears); re-archive → 13 (row gone again). Unblocks the user-visible "Delete chat" UX in chat#1763.
  • Sidebar delete + rename use session-scoped api (chat#1763, merged as 3b93c191) — rename now hits PATCH /api/sessions/{sid}/chats/{cid} with { title } (was legacy PATCH /api/chats with { chatId, topic }), updating the canonical chat row; delete archives the owning session via new lib/sessions/archiveSession.ts calling PATCH /api/sessions/{sid} { status: "archived" } (instead of removing the chat row). Archive triggers stopSandboxOnArchive on the api side so the running sandbox is torn down. Replaced chat#1758 which was auto-closed when its base branch was deleted by the chat#1756 merge. Verified end-to-end on preview after api#630 landed: rename → row title updates and bubbles to top; delete → confirmation modal → PATCH /api/sessions { status: "archived" } → next auto-refetch returns 13 chats instead of 14, row vanishes within ~1s. Reversible from admin side via status: "running".
  • Sidebar consumes session-scoped chat listing with artist-scoped filter (chat#1756, merged as 44550b51) — getConversations now hits GET /api/chats?artist_account_id={id} server-side instead of client-side filtering on item.artist_id; the artist id is part of the react-query key so switching artists triggers a fresh fetch. Sidebar rows click through to the canonical /sessions/{sid}/chats/{cid} URL (now that chat#1752's route exists, no more 404s). Two review-comment follow-ups folded in: moved lib/getConversations.tsxlib/chat/getConversations.tsx per the domain-folder convention; replaced the manual TS-cast response decode with a zod schema mirroring docs#227 ChatRoom so a wire-shape drift now raises at the boundary instead of corrupting UI state. Verified live: initial load fired the artist-scoped query (14 chats, every row's artistId matched), artist switch triggered 5 distinct refetches with new IDs, empty state rendered correctly for artists with no chats.
  • Centralize useAutoLogin into UserProvider (chat#1761, merged as 36a3e140; supersedes and closes chat#1753) — <UserAutoLogin /> child component inside UserProvider fires useAutoLogin() once for the entire app. Three follow-up commits during review collapsed every remaining call site: per-page calls removed from CatalogSongsPage, CatalogsPage, TasksPage; LegacyAutoLogin wrapper deleted from chat.tsx (scaffolding that chat#1760 had added for the /chat/[roomId] route); useAutoLogin() call deleted from NewChatBootstrap.tsx. Net: 5 call sites → 1, no behavior change. Verified live: anonymous landings on /, /catalogs, /tasks, and /chat/[roomId] all still get the Privy "Log in or sign up" modal — confirms the UserProvider centralization covers every authenticated route.
  • Stamp sessions.artist_id from selected artist + refactor useArtists (chat#1759, merged as 4fad683b) — new-chat bootstrap now passes selectedArtist.account_id as artistId to POST /api/sessions, persisting it on sessions.artist_id so new chats appear in the sidebar's artist filter and the /api/chats/{chatId}/artist lookup returns the correct artist instead of 404'ing. useArtists rewritten with useQuery-based roster and useMemo-derived selection (single render) — closes the duplicate-POST race the old useEffect-driven selection caused. Header/Artist.tsx hard-navigates to /chat via window.location.href on artist-switch inside a tagged chat so the bootstrap re-mints under the new context. Four OCP extractions during review: hooks/artists/useArtistSelection.ts (selection logic), hooks/artists/useArtistsRoster.ts (roster fetch/cache/refetch), hooks/sessions/useProvisionChatSession.ts (mutation lifecycle), lib/sessions/provisionChatSession.ts (api call combo) — useNewChatBootstrap shrunk to ~30 LOC of provider wiring. Known follow-up flagged in PR comment: each artist switch currently provisions 3 sessions + 3 sandboxes (one input transition per render during hard-nav settling); only the final session is used by <Chat>, but the first two are orphaned. Worth tightening the sameInputs guard or debouncing input changes — not blocking.

Open — next up (in merge order; pick from the top)

Each step is independently scoped. Bracketed labels link to existing PRs. Blocked-by relationships are noted explicitly; everything else can be picked up in parallel.

  1. chat — drop legacy useChatTransport branch + delete app/chat/[roomId]/page.tsx. NEW PR (~30 LOC). Now that HomePage uses NewChatBootstrap (chat#1760, merged), every <Chat> mount has a sessionId. Make sessionId required through useVercelChat / VercelChatProvider / Chat; delete the if(!sessionId) legacy fork in useChatTransport; delete app/chat/[roomId]/page.tsx so legacy URLs return 404 (clean deprecation per the URL-preservation decision being reversed). Patterns lifted from chat#1755. No blockers — unblocked by refactor(chat): remove UUID generation and update HomePage to use NewChatBootstrap #1760. (Note: LegacyAutoLogin already removed by chat#1761.)

  2. Cleanup. Close chat#1754 (superseded; chat-side direct-DB approach is wrong layer) and chat#1755 (mine, Approach A pivoted away from), each with a one-line rationale comment.

Open — pre-existing docs gaps (separate work, low priority)

Surfaced while verifying api#628 responses against docs#228. Neither was introduced by this cutover work; tracked here for visibility:

  • Session.isNewBranch is marked required in the OpenAPI schema but is missing from every actual response (POST/GET/PATCH /api/sessions). Either drop the required-tag or have the api always emit it.
  • PATCH /api/sessions/{id} has no documented request body schema (PatchSessionBody doesn't exist in the spec) even though the endpoint clearly accepts { title }. Small docs PR to add it.

Open — also wanted (not in the cutover critical path)

  • Defer bootstrap wait from spinner to send button. Today NewChatBootstrap blocks the render with a full-screen spinner while POST /api/sessions + POST /api/sandbox resolve — multi-second wait before the input is even visible. UX should be: render <Chat> with the input visible + enabled immediately, kick off the bootstrap in parallel, and only block the Send button on the bootstrap promise (with a small inline "preparing…" affordance) before firing the first message. Either restructure NewChatBootstrap to surface bootstrap state to <Chat> (option A) or eager-fire the bootstrap right after login from a top-level effect and have useNewChatBootstrap find the already-provisioned session via cache (option B, pre-cutover pattern from chat#1564). chat#1564 is closed as superseded (predated the cutover, targets the old /api/sandboxes/setup); reanimate the intent against the new per-session /api/sandbox flow. Worth measuring the actual wait time on prod before committing to a fix shape — if median is <2s the spinner is acceptable and this becomes a polish item; if >5s the bottleneck may be on the api side (warm sandbox pool, faster ensurePersonalRepo) rather than the client.

  • Credit-spend visibility digest (Telegram). Re-scope the dropped new-conversation Telegram ping into a spend-monitoring digest. A Vercel Cron in api (*/10 * * * * → internal route, e.g. POST /api/internal/credit-spend-digest) reads usage_events, groups spend by account over the window, and posts a top-spenders summary to Telegram.

    • Window: time-based — created_at >= now() - interval '10 minutes' (stateless; accept minor boundary drift. Upgrade to a watermark cursor only if missed events ever matter).
    • Aggregation: in Postgres (GROUP BY account_id + model_id / agent_type, SUM(cents)), not JS. Join account_id → account name/email for a readable message.
    • Content: top N accounts by total credits (desc); per account show the how — credits by model_id, main-vs-subagent split, turn count, token totals.
    • Empty window: no-op (no empty pings). Reuse lib/telegram/trimMessage.ts for Telegram's length cap.
    • Supersedes the "Telegram new-conversation notifications" line in Accepted regressions — deliberate re-scope (per-session ping → per-window spend digest), not a 1:1 port.
  • Phase 4 — legacy code cleanup (only after steps 1–9 are stable on test and the cutover bundle is promoted to main). Delete lib/chat/handleChatStream.ts, app/api/chat/route.ts, getGeneralAgent.ts, setupChatRequest.ts, setupToolsForRequest.ts, MCP/Composio plumbing, lib/chat/handleChatCompletion.ts (memories + email + Telegram) — i.e. remove every remaining reader of rooms/memories.

  • Drop legacy rooms + memories tables (LAST — only after the chat workflow cutover is merged to main, the final straggler backfill re-run has caught any rooms created in the meantime, and Phase 4 has removed the last rooms/memories readers). Requires a migration file in recoupable/database to drop them (plus FK-dependent tables like room_reports / segment_rooms / memory_emails as applicable). Destructive + irreversible — snapshot first and confirm nothing still reads rooms/memories before dropping.

Architecture decisions

  • cloneUrl construction lives on api. Personal sessions: recoupable/<accountId>. Org sessions: recoupable/<organizationId>. Derived server-side via ensurePersonalRepo (api #618, #620) inside createSessionHandler and returned on session.cloneUrl. Clients never build GitHub URLs.
  • chatId source of truth is api. createSessionHandler mints chat.id; the client uses it as the <Chat> surface id, the workflow body's chatId, and the URL's chat id.
  • Canonical URL is /sessions/{sessionId}/chats/{chatId} (per the pivot from the original "preserve /chat/[roomId]" stance). Legacy /chat/[roomId] URLs are being deprecated — no redirect, eventual 404 in step 8.
  • All chat-related api endpoints are session-scoped (GET/PATCH/DELETE /api/sessions/{sid}/chats/{cid}, POST /api/sandbox, POST /api/sessions/{sid}/chats/{cid}/read, POST /api/sessions { artistId }). The chat client carries sessionId in URLs + state to match.
  • sessions.artist_id is the canonical artist link, not chats.artist_id. Aligns with the multi-chat-per-session future direction; one artist context per session, all its chats inherit. Mirrors legacy rooms.artist_id.
  • Accepted regressions (the explicit cost of architectural unification): no MCP tools (artist data, music industry APIs), no Composio tools (Sheets, Drive, Docs, TikTok), no artist context in system prompt, no chat title generation, no send_email tool, no Telegram new-conversation notifications (being re-scoped into the credit-spend digest above). Re-add later if needed once the workflow path is stable.

Inherited gaps from the open-agents cutover

Tracked in recoupable/api#605. chat.recoupable.com inherits them at cutover; not blocking per @sweetmantech direction:

  • Stop monitor — Stop button doesn't actually halt the model
  • Multi-tool-call traces (now confirmed broken client-side — see follow-up above)
  • Sandbox state persistence missing
  • Sub-agent (task tool) credit attribution missing
  • ask_user_question result persistence on device switch missing
  • workflow_runs telemetry missing

Source references

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions