fix(studio): server-side tool execution via needsApproval flow (AI-658) by mattrossman · Pull Request #45556 · supabase/supabase

mattrossman · 2026-05-04T19:12:51Z

Second attempt at fixing AI-658, superseding #45339.

Instead of patching tracing around client-side SQL execution, this refactors execute_sql and deploy_edge_function to use AI SDK's needsApproval pattern — execution moves to the server, which resolves the split-trace problem structurally.

execute_sql and deploy_edge_function tools now have needsApproval: true and server-side execute handlers; client no longer runs mutations directly
UI approval flow uses addToolApprovalResponse instead of addToolResult; DisplayBlockRenderer and EdgeFunctionRenderer are stripped of client-side mutation hooks
Braintrust span context is threaded across the approval boundary so Turn 1 and Turn 2 land in the same trace; Turn 1 suppresses online scoring until Turn 2 logs the combined output
Restores edge function replace-warning guard (regression from initial attempt)

Closes AI-658

References

- Upgrade ai package to 6.0.173 - Add needsApproval + server-side execute to execute_sql and deploy_edge_function - Switch sendAutomaticallyWhen to lastAssistantMessageIsCompleteWithApprovalResponses - Client uses addToolApprovalResponse instead of addToolResult - Remove client-side SQL/deploy execution from renderers; move to server - Skip Braintrust span output on Turn 1 (approval boundary); recover approved tool data from rawMessages on Turn 2 - Add ai to pnpm minimumReleaseAgeExclude to unblock same-day publish

…Response (AI-658)

…g to onFinish (AI-658)

… trace (AI-658)

…tions, not all turns (AI-658)

…ons (AI-658)

vercel · 2026-05-04T19:12:52Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
studio-self-hosted	Ready	Preview, Comment	May 4, 2026 9:01pm
studio-staging	Ready	Preview, Comment	May 4, 2026 9:01pm

6 Skipped Deployments

Project	Deployment	Updated (UTC)
studio	Ignored	May 4, 2026 9:01pm
design-system	Skipped	May 4, 2026 9:01pm
docs	Skipped	May 4, 2026 9:01pm
learn	Skipped	May 4, 2026 9:01pm
ui-library	Skipped	May 4, 2026 9:01pm
zone-www-dot-com	Skipped	May 4, 2026 9:01pm

supabase · 2026-05-04T19:12:57Z

This pull request has been ignored for the connected project xguihxuzqibwxjnimxev because there are no changes detected in supabase directory. You can change this behaviour in Project Integrations Settings ↗︎.

Preview Branches by Supabase.
Learn more about Supabase Branching ↗︎.

coderabbitai · 2026-05-04T19:12:59Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Repository UI (base), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: 0178e3ca-4e86-408e-bac0-3ea7c41d12ff

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch mattrossman/ai-658-server-side-approval

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

Generate code and open pull requests
Plan features and break down work
Investigate incidents and troubleshoot customer tickets together
Automate recurring tasks and respond to alerts with triggers
Summarize progress and report instantly

Built for teams:

Shared memory across your entire org—no repeating context
Per-thread sandboxes to safely plan and execute work
Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

…I-658) - Add isApprovalContinuation/getLastUserText helpers to message-utils.ts - Merge duplicate JSX branches in MessagePartExecuteSql - Fix existingFunction race in EdgeFunctionRenderer (restore eager fetch)

- Remove parent span threading so Turn 2 is its own root trace and online scorers see input/output on the root span - Recover Turn 1 text parts from rawMessages so the full response (pre- and post-approval) appears in the output - Use shared isApprovalContinuation/getLastUserText helpers - Replace part.type.replace('tool-', '') with getToolName()

…h needsApproval (#45654) ## Motivation When Assistant runs a potentially destructive tool like `execute_sql`, it stops the LLM request and prompts for client-side approval and execution of the tool. After approval, a second request kicks off under a separate trace. This has made scoring and [Topics](https://www.braintrust.dev/blog/topics) classification challenging, as the generated `output` is split across stateless requests. The [span-level scoring](https://www.braintrust.dev/docs/evaluate/custom-code#score-spans) approach we've used thusfar (after the LLM call, we massage the result into an `output` payload that's stuck onto the root span) has been cumbersome and led to invalid scores / topics where only part of the assistant response is considered. It's also inefficient, as we're duplicating potentially large info (like the `search_docs` output) that already exists within the trace. An alternative to scoring spans is to [score traces](https://www.braintrust.dev/docs/evaluate/custom-code#score-traces). Braintrust [best practices](https://www.braintrust.dev/docs/evaluate/score-online#best-practices) advise: > Use span scope for evaluating individual operations or outputs. Use trace scope for evaluating multi-turn conversations, overall workflow completion, or when your scorer needs access to the full execution context. We've also received [direct guidance](https://supabase.slack.com/archives/C05QYJBLX89/p1777925770927149?thread_ts=1777905716.911979&cid=C05QYJBLX89) from their team to use this approach. ## Changes Migrates eval scorers from custom `AssistantEvalOutput` shape to trace-level scoring via `trace.getThread()` / `trace.getSpans()`, with thread parsing that scores the full latest Assistant turn and passes prior conversation separately where relevant. Moves `execute_sql` and `deploy_edge_function` from client-side execution after approval to AI SDK `needsApproval` + server-side `execute()`. SQL results returned to the model are gated by AI opt-in level, so row data is only included with `schema_and_log_and_data`; otherwise the tool returns the no-data-permissions sentinel. Adds `metadata.isFinalStep` to disambiguate multiple LLM requests within an "assistant" turn due to tool call requests/responses. For online evals, this means we should configure automations to only score traces with `metadata.isFinalStep = true` to ensure we're judging the complete generated response. Other minor kaizen changes: - Renamed `promptProviderOptions` to `systemProviderOptions` to clarify that this is associated with the "system" message and disambiguate from the root `providerOptions` - Adds `evals/trace-utils.ts` to handle Zod validation of the `unknown` span shapes from Braintrust, to more easily access typed inputs/output on tool spans. - Bumps AI SDK floor version `^6.0.116` → `^6.0.174` - Tweaked the "Conciseness" scorer to not unfairly dock points for the new `[called tool_name]` labels in serialized assistant response ## Verification In the studio staging build, I asked Assistant to create a todos table with 3 sample todos. I manually approved the `execute_sql` call and saw Assistant generate text before & after the call. In Braintrust I verified two traces were produced (see [filtered logs](https://www.braintrust.dev/app/supabase.io/p/Assistant/logs?v=Staging&tvt=trace&search={%22filter%22:[{%22text%22:%22metadata.environment%2520%253D%2520%27staging%27%22,%22label%22:%22metadata.environment%2520%253D%2520%27staging%27%22,%22originType%22:%22btql%22},{%22text%22:%22%2560Chat%2520ID%2560%2520%253D%2520%25221cb2ac45-e5e7-458c-9da4-3bf6863b8842%2522%22,%22label%22:%22Chat%2520ID%2520equals%25201cb2ac45-e5e7-458c-9da4-3bf6863b8842%22,%22originType%22:%22form%22}]})), the first with `metadata.isFinalStep = false` and the second with `metadata.isFinalStep = true`. In the Braintrust staging scorers, I ran the preview Completeness scorer on the second trace and verified it sees the complete Assistant response including markers for tool calls ([link to trace](https://www.braintrust.dev/app/supabase.io/p/Assistant%20(Staging%20Scorers)/trace?object_type=project_logs&object_id=b5214b62-ad1e-4929-9d5b-40b1daebe948&r=0ed0a4f8-8aff-4a34-bb1d-1df1d88a5070&s=ff9015f8-6bf7-4ab3-83a9-ca4e69e27e82)) <img width="1193" height="960" alt="CleanShot 2026-05-07 at 11 27 10@2x" src="https://github.com/user-attachments/assets/509d4858-c3a1-4068-986d-3aa4d5617d1a" /> I also tested the `deploy_edge_function` workflow and verified it still prompts for permission and warns on deployment of existing functions. **References** - https://www.braintrust.dev/docs/evaluate/custom-code#score-traces - https://ai-sdk.dev/docs/ai-sdk-core/tools-and-tool-calling#tool-execution-approval Supercedes #45556 and #45339 Closes AI-473  ## Summary by CodeRabbit * **New Features** * Tool actions (SQL execution, edge-function deploy) now require explicit user Approve/Deny before proceeding. * **Improvements** * Assistant pauses for approval responses before sending follow-ups, giving clearer control over risky actions. * Deploy/replace flows show confirmation and clearer replace warnings. * Evaluation/scoring updated to use richer trace data for more accurate assistant performance signals.

mattrossman added 6 commits May 1, 2026 15:02

fix(studio): use approval.id instead of toolCallId in addToolApproval…

828f379

…Response (AI-658)

fix(studio): suppress Turn 1 online scoring by deferring input loggin…

58859c5

…g to onFinish (AI-658)

feat(studio): thread Braintrust span context across turns for unified…

de7bad5

… trace (AI-658)

fix(studio): only thread Braintrust parent span for approval continua…

4f4633c

…tions, not all turns (AI-658)

fix(studio): restore edge function replace warning and fix TS violati…

ef99173

…ons (AI-658)

mattrossman added 2 commits May 4, 2026 16:57

refactor(studio): extract shared AI message utilities and simplify (A…

c7dffb8

…I-658) - Add isApprovalContinuation/getLastUserText helpers to message-utils.ts - Merge duplicate JSX branches in MessagePartExecuteSql - Fix existingFunction race in EdgeFunctionRenderer (restore eager fetch)

mattrossman mentioned this pull request May 11, 2026

feat(assistant): trace-level scorers + server-side tool execution with needsApproval #45654

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(studio): server-side tool execution via needsApproval flow (AI-658)#45556

fix(studio): server-side tool execution via needsApproval flow (AI-658)#45556
mattrossman wants to merge 8 commits into
masterfrom
mattrossman/ai-658-server-side-approval

mattrossman commented May 4, 2026

Uh oh!

vercel Bot commented May 4, 2026 •

edited

Loading

Uh oh!

supabase Bot commented May 4, 2026

Uh oh!

coderabbitai Bot commented May 4, 2026 •

edited

Loading

Review skipped

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

mattrossman commented May 4, 2026

Uh oh!

vercel Bot commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

supabase Bot commented May 4, 2026

Uh oh!

coderabbitai Bot commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vercel Bot commented May 4, 2026 •

edited

Loading

coderabbitai Bot commented May 4, 2026 •

edited

Loading