Skip to content

AI analytics#1339

Open
aadesh18 wants to merge 81 commits intodevfrom
ai-analytics
Open

AI analytics#1339
aadesh18 wants to merge 81 commits intodevfrom
ai-analytics

Conversation

@aadesh18
Copy link
Copy Markdown
Collaborator

@aadesh18 aadesh18 commented Apr 15, 2026

This PR adds an AI analytics dashboard to the internal tool and centralises AI query observability. Every call through the /ai/query and /integrations/ai-proxy endpoints is now logged to SpacetimeDB with token counts, cost, latency, and full conversation replay. The internal tool gains a new "Unified AI Endpoint Analytics" tab backed by a new useAiQueryLogs SpacetimeDB subscription, plus reviewer enrollment and unmark-reviewed flows.

How to review this PR:

  1. Go through the changes in apps/backend/src/app/api/latest/ai/query/[mode]/route.ts and apps/backend/src/app/api/latest/integrations/ai-proxy/[[...path]]/route.ts. These are the entry points to the logging
  2. Go through the apps/backend/src/app/api/latest/internal folder. It contains the routes that interact spacetime db
  3. Other changes in backend:
    1. Under lib/ai - most of these files are helper files that abstract a lot of logic from the ai-proxy and ai-query endpoints, especially the logging logic
    2. .env - added STACK_SPACETIMEDB_SERVICE_TOKEN
  4. Look through the internal tool

Summary by CodeRabbit

  • New Features

    • AI Query Usage dashboard: filtering, sorting, charts, metrics, details view
    • Reviewer enrollment for SpacetimeDB and “Unmark reviewed” action
  • Improvements

    • Separate Usage and Calls tabs with session-persistent selection
    • Call log: pagination, sortable “Human Reviewed” column, optimistic mark/unmark UX
    • Public published Q&A projection for safe read-only consumption
  • Infrastructure

    • Centralized AI proxy, logging, and query telemetry; HTTP-based reducer calls and safer DB access controls

mantrakp04 and others added 30 commits March 23, 2026 10:48
- Added new internal API endpoint for documentation tools, allowing actions such as listing available docs, searching, and fetching specific documentation by ID.
- Updated environment configuration to support optional internal secret for enhanced security.
- Refactored existing search functionality to utilize the new docs tools API instead of the previous MCP server.
- Improved error handling and response parsing for documentation-related requests.
- Expanded documentation to clarify the relationship between the new tools and existing API functionalities.

This update streamlines the documentation access process and enhances the overall developer experience.
- Introduced error capturing for failed HTTP requests in the docs tools API, improving debugging capabilities.
- Updated the API response for unsupported methods to include an 'Allow' header, clarifying the expected request type.

These changes enhance the robustness of the documentation tools integration and improve developer experience.
- Updated the key name in the capabilities section of the API documentation to follow a consistent naming convention, improving clarity and maintainability.
The .gitmodules was updated in d22593d to point at
apps/backend/src/private/implementation, but the gitlink entry (mode
160000) was never added to the tree. This caused
`git clone --recurse-submodules` to silently skip the private submodule.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…on for docs tools

- Added `STACK_DOCS_INTERNAL_BASE_URL` to backend `.env` and `.env.development` files for AI tool bundle configuration.
- Removed references to `STACK_INTERNAL_DOCS_TOOLS_SECRET` from backend and docs environment files and validation logic from the docs tools API route.
- Introduced a new `.env` file for the docs app with essential configuration variables.
Copy link
Copy Markdown
Collaborator

@nams1570 nams1570 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a partial review. Identified some potential bugs and set up some discussions. Also, maybe we should also log system prompt to our internal tool? Otherwise it's a bit hard to debug.

Comment thread apps/backend/src/app/api/latest/ai/query/[mode]/route.ts
Comment thread apps/backend/src/app/api/latest/ai/query/[mode]/route.ts Outdated
Comment thread apps/backend/src/app/api/latest/ai/query/[mode]/route.ts Outdated
Comment thread apps/backend/src/lib/ai/qa/verified-qa.ts
Comment thread apps/backend/src/app/api/latest/ai/query/[mode]/route.ts Outdated
Comment thread apps/backend/src/app/api/latest/ai/query/[mode]/route.ts Outdated
Comment thread apps/backend/src/lib/ai/ai-query-handlers.ts Outdated
Comment thread apps/backend/src/lib/ai/ai-query-handlers.ts Outdated
Comment thread apps/backend/src/lib/ai/ai-query-handlers.ts Outdated
Comment thread apps/backend/src/lib/ai/ai-query-handlers.ts Outdated
Copy link
Copy Markdown
Collaborator

@nams1570 nams1570 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick partial review. It looks like internal-tool is a new project, which means it doesnt get rolled into the existing instrumentation in instrumentation.ts. This means no sentry, no jaeger, no otel. If things go wrong, we won't be able to see it.

Comment thread apps/backend/src/app/api/latest/integrations/ai-proxy/[[...path]]/route.ts Outdated
Comment thread apps/backend/src/app/api/latest/integrations/ai-proxy/[[...path]]/route.ts Outdated
Comment thread apps/backend/src/lib/ai/ai-proxy-handlers.ts Outdated
Comment thread apps/backend/src/lib/ai/ai-proxy-handlers.ts Outdated
systemPromptId: apiKey === "stack-auth-proxy" ? "stack-cli" : apiKey,
quality: "unknown",
speed: "unknown",
modelId: String(parsed.model ?? OPENROUTER_DEFAULT_MODEL),
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why cast this here? isnt it already validated earlier?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed it

Comment thread apps/backend/src/lib/ai/spacetimedb-client.ts
Comment on lines +73 to 76
if (!res.ok) {
const preview = (await res.text()).slice(0, 200);
throw new StackAssertionError(`Reducer ${reducer} failed (${res.status}): ${preview}`);
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would mean that any spacetime db failure would be treated as a 500 no? Internal server error? But if a row was deleted while someone was looking at it, that's a 404 not a 500 right?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, fixed

Comment thread apps/backend/.env Outdated
STACK_SPACETIMEDB_URL=# SpacetimeDB host URL; default empty (logging disabled)
STACK_SPACETIMEDB_DB_NAME=# SpacetimeDB database name
STACK_MCP_LOG_TOKEN=# shared secret gating the log_mcp_call reducer; must match EXPECTED_LOG_TOKEN in apps/internal-tool/spacetimedb/src/index.ts
STACK_SPACETIMEDB_SERVICE_TOKEN=# optional; pre-provisioned SpacetimeDB identity token for the backend service. To mint: `curl -X POST <STACK_SPACETIMEDB_URL>/v1/identity` and copy the `token` field.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you clarify? This isn't super clear

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, changed the comment

conversationId: string | undefined,
};

export async function logAiQuery(entry: AiQueryLogEntry): Promise<void> {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: what do you think about having some validation for the entry fields here to make sure they're not NaN or infinity or something like that? Asking because despite it not being a possibility right now, if there is a bug, we would end up silently dropping logs.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good idea, added validation and captureError

Comment thread apps/internal-tool/src/lib/env.ts Outdated
Copy link
Copy Markdown
Collaborator

@nams1570 nams1570 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

End of review. Some more bugs

Comment thread apps/backend/src/lib/ai/mcp-logger.ts Outdated
Comment thread apps/backend/src/lib/ai/openrouter-usage.ts Outdated
Comment on lines +78 to +79
while ((boundary = buffer.indexOf("\n\n")) !== -1) {
const block = buffer.slice(0, boundary);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential bug: I think this would break. as per https://html.spec.whatwg.org/multipage/server-sent-events.html#parsing-an-event-stream
sse's can terminate with carriage returns \r as well.
\n\n, \r\n\r\n, and \r\r.
This is a pretty high impact bug, the buffer would keep growing imo

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed it

Comment on lines +91 to +93
} finally {
reader.releaseLock();
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential bug: So the issue here is that this finally captures the loop, but not the few statements above the loop. In general, you want the ability to release a lock on failed states as soon as the lock is acquired, think Go's "defer" clause.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, fixed

Comment thread apps/backend/src/lib/ai/qa-reviewer.ts Outdated
humanCorrectedQuestion: args.correctedQuestion,
humanCorrectedAnswer: args.correctedAnswer,
humanReviewedAt: row.humanReviewedAt ?? ctx.timestamp,
humanReviewedBy: row.humanReviewedBy ?? args.reviewedBy,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Design question/ Bug: This looks a little weird. You fallback to whoeever issued the correction/update, but keep the original. That means that you could be in a position where it says "xyz reviewed this" but abc was the last person to update it. I guess this is also a design question, do we want audit log for this?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea it was a bug, fixed it. now we keep the latest reviewer

humanReviewedAt: row.humanReviewedAt ?? ctx.timestamp,
humanReviewedBy: row.humanReviewedBy ?? args.reviewedBy,
publishedToQa: args.publish,
publishedAt: args.publish ? (row.publishedAt ?? ctx.timestamp) : undefined,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible for something to be unpublished and then republished? if so, we lose info on when something was first published, if that is important

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes it is possible. i feel like we wanna keep the latest publishedAt (the field should basically reflect the most recent time when it was published)

}
ctx.db.operators.insert({
identity: args.identity,
addedAt: ctx.timestamp,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does ctx change based on where someone logs in from? if so this would keep getting rewritten

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

ctx.db.operators.insert({
identity: ctx.sender,
addedAt: ctx.timestamp,
stackUserId: '__service__',
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is fine for now since its purely internal, but might be good to have a guard to reject any stackUserIds that are being created with the name _service_

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a guard

if (row == null) {
throw new SenderError('Call log not found for correlationId: ' + args.correlationId);
}
ctx.db.mcpCallLog.id.delete(row.id);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ai_query_log has mcpcorrelationid right? Would this delete then orphan the ai_query_log rows? should we have some sort of cascading delete?

This PR introduces a context chip bar for the AI dashboard chat panel —
users can click widgets in the sandbox iframe, trigger "Add a
component", or auto-capture runtime errors as chips that are prepended
to the next AI request. It also adds a patchDashboard tool for surgical
find-and-replace edits alongside the existing updateDashboard, streaming
dashboard generation, and an improved system prompt with hook-ordering
guidance for generated components.
Copy link
Copy Markdown
Collaborator Author

@aadesh18 aadesh18 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

made changes based on your comments

Comment thread apps/backend/.env Outdated
STACK_SPACETIMEDB_URL=# SpacetimeDB host URL; default empty (logging disabled)
STACK_SPACETIMEDB_DB_NAME=# SpacetimeDB database name
STACK_MCP_LOG_TOKEN=# shared secret gating the log_mcp_call reducer; must match EXPECTED_LOG_TOKEN in apps/internal-tool/spacetimedb/src/index.ts
STACK_SPACETIMEDB_SERVICE_TOKEN=# optional; pre-provisioned SpacetimeDB identity token for the backend service. To mint: `curl -X POST <STACK_SPACETIMEDB_URL>/v1/identity` and copy the `token` field.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, changed the comment

Comment thread apps/backend/src/app/api/latest/ai/query/[mode]/route.ts
Comment thread apps/backend/src/app/api/latest/ai/query/[mode]/route.ts Outdated
Comment thread apps/backend/src/app/api/latest/ai/query/[mode]/route.ts Outdated
Comment on lines +99 to +103
content: systemPrompt,
...(isAnthropic && {
providerOptions: { openrouter: { cacheControl: { type: "ephemeral" } } },
}),
};
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we discussed, most of the routes are being hit at least once in 5 mins. however in future we should think about not having the cache for the routes that are not being hit too often. added a comment explain the same

}
ctx.db.operators.insert({
identity: args.identity,
addedAt: ctx.timestamp,
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

ctx.db.operators.insert({
identity: ctx.sender,
addedAt: ctx.timestamp,
stackUserId: '__service__',
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a guard

humanCorrectedQuestion: args.correctedQuestion,
humanCorrectedAnswer: args.correctedAnswer,
humanReviewedAt: row.humanReviewedAt ?? ctx.timestamp,
humanReviewedBy: row.humanReviewedBy ?? args.reviewedBy,
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea it was a bug, fixed it. now we keep the latest reviewer

humanReviewedAt: row.humanReviewedAt ?? ctx.timestamp,
humanReviewedBy: row.humanReviewedBy ?? args.reviewedBy,
publishedToQa: args.publish,
publishedAt: args.publish ? (row.publishedAt ?? ctx.timestamp) : undefined,
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes it is possible. i feel like we wanna keep the latest publishedAt (the field should basically reflect the most recent time when it was published)

Comment thread apps/internal-tool/src/lib/env.ts Outdated
@mantrakp04
Copy link
Copy Markdown
Collaborator

@greptile-ai review

Comment on lines +36 to +37
const qaId = BigInt(body.qaId);
await callReducerStrict("update_qa_entry", [token, qaId, body.question, body.answer, editor]);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Non-atomic two-phase mutation

The handler issues two separate HTTP reducer calls without any rollback mechanism. If update_qa_entry succeeds but set_qa_published fails (e.g. a network blip or SpacetimeDB 502), the entry's text is saved with the new question/answer but its published flag is left in the old state. The client receives a 502 and the user's intent (e.g. "save and publish") is only half-executed, creating a silent inconsistency.

Consider consolidating both mutations into a single SpacetimeDB reducer (e.g. update_qa_entry_and_publish) that applies both changes atomically within one transaction.

Prompt To Fix With AI
This is a comment left during a code review.
Path: apps/backend/src/app/api/latest/internal/mcp-review/update-qa-entry/route.ts
Line: 36-37

Comment:
**Non-atomic two-phase mutation**

The handler issues two separate HTTP reducer calls without any rollback mechanism. If `update_qa_entry` succeeds but `set_qa_published` fails (e.g. a network blip or SpacetimeDB 502), the entry's text is saved with the new question/answer but its `published` flag is left in the old state. The client receives a 502 and the user's intent (e.g. "save and publish") is only half-executed, creating a silent inconsistency.

Consider consolidating both mutations into a single SpacetimeDB reducer (e.g. `update_qa_entry_and_publish`) that applies both changes atomically within one transaction.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +89 to +90
correlationId,
mode,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 mcpCorrelationId is always equal to correlationId

correlationId is the UUID freshly minted for this AI query, and mcpCorrelationId is set to that same value when mcpCallMetadata is present. In logIfMcpToolCall, the MCP call log row is also written with the same correlationId. So ai_query_log.mcpCorrelationId and ai_query_log.correlationId carry identical values, making the field redundant. The field name implies a different ID (the MCP log's own correlation key), which will confuse future maintainers. Either give the MCP call log its own distinct UUID and store that here, or remove the mcpCorrelationId column and rely on the shared correlationId.

Prompt To Fix With AI
This is a comment left during a code review.
Path: apps/backend/src/app/api/latest/ai/query/[mode]/route.ts
Line: 89-90

Comment:
**`mcpCorrelationId` is always equal to `correlationId`**

`correlationId` is the UUID freshly minted for this AI query, and `mcpCorrelationId` is set to that same value when `mcpCallMetadata` is present. In `logIfMcpToolCall`, the MCP call log row is also written with the same `correlationId`. So `ai_query_log.mcpCorrelationId` and `ai_query_log.correlationId` carry identical values, making the field redundant. The field name implies a _different_ ID (the MCP log's own correlation key), which will confuse future maintainers. Either give the MCP call log its own distinct UUID and store that here, or remove the `mcpCorrelationId` column and rely on the shared `correlationId`.

How can I resolve this? If you propose a fix, please make it concise.

return {
correlationId,
mode: parsed.stream === true ? "stream" : "generate",
systemPromptId: apiKey === "stack-auth-proxy" ? "stack-cli" : apiKey,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 The "stack-auth-proxy" string is silently coerced to "stack-cli" with no documentation. If the caller passes a different key prefix in the future, or if this sentinel value is ever renamed, logs will accumulate under the raw API-key value with no warning. Extracting the mapping into a named function makes the contract explicit.

Suggested change
systemPromptId: apiKey === "stack-auth-proxy" ? "stack-cli" : apiKey,
systemPromptId: resolveSystemPromptId(apiKey),
Prompt To Fix With AI
This is a comment left during a code review.
Path: apps/backend/src/lib/ai/loggers/ai-proxy-logger.ts
Line: 28

Comment:
The `"stack-auth-proxy"` string is silently coerced to `"stack-cli"` with no documentation. If the caller passes a different key prefix in the future, or if this sentinel value is ever renamed, logs will accumulate under the raw API-key value with no warning. Extracting the mapping into a named function makes the contract explicit.

```suggestion
    systemPromptId: resolveSystemPromptId(apiKey),
```

How can I resolve this? If you propose a fix, please make it concise.

const rawCost = extractCostFromUsage(usage);
runAsynchronouslyAndWaitUntil(logAiQuery({
...common,
stepsJson: serializeSteps(steps),
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BUG: logging-only serialization can break a successful AI response.

Walkthrough: the AI request can finish successfully, then logAiQuerySuccess tries to prepare the log row. Because serializeSteps(steps) is evaluated before the fire-and-forget logging promise is handed to runAsynchronouslyAndWaitUntil, any JSON.stringify failure here throws synchronously. I reproduced this with a tool result containing a circular object: the answer path is otherwise successful, but logging throws circular before the async boundary.

Expected behavior: logging should be isolated from the user-facing success path. Build/serialize the logging payload inside the async logging task, or catch serialization failures and log a safe placeholder.

token,
correlationId: body.correlationId,
});
BigInt(body.qaId),
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BUG: invalid qaId becomes a server exception instead of a clean 400.

Walkthrough: the request schema only says qaId is a string, so "abc" passes validation. The handler then calls BigInt(body.qaId), and BigInt("abc") throws during request handling. I reproduced this by invoking the route handler with reviewer auth and { qaId: "abc" }; the failure comes from conversion, not validation.

Expected behavior: validate the decimal/u64 shape before conversion and return a normal schema/client error for malformed IDs.

const editor = user.display_name ?? user.primary_email ?? user.id;
const qaId = BigInt(body.qaId);
await callReducerStrict("update_qa_entry", [token, qaId, body.question, body.answer, editor]);
await callReducerStrict("set_qa_published", [token, qaId, body.publish]);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BUG: this update is not atomic.

Walkthrough: editing text and changing publish state are two separate reducer calls. I reproduced the failure by mocking update_qa_entry to succeed and set_qa_published to fail. The route returns an error, but the question/answer mutation has already been committed.

Expected behavior: use a single reducer for “update text plus publish state” so the whole operation either commits or fails together. Partial review edits are hard for reviewers to reason about because the UI reports failure while the data has changed.


const token = getEnvVariable("STACK_MCP_LOG_TOKEN");
await conn.reducers.updateHumanCorrection({
await callReducerStrict("upsert_qa_from_call", [
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BUG: submitting a correction no longer marks the MCP call reviewed.

Walkthrough: this route only calls upsert_qa_from_call. I reproduced that by mocking callReducerStrict and invoking the handler; the only reducer call was upsert_qa_from_call, with no mark_human_reviewed. If the old correction flow implied review completion, corrected calls will still appear unreviewed unless the reviewer separately clicks mark reviewed.

Expected behavior: either make correction submission mark the source call reviewed in the same logical operation, or intentionally reflect the two-step workflow in the UI/API.

- If the user asks to change styling, colors, or layout, make those changes while preserving functionality.
- Always call the updateDashboard tool with the COMPLETE updated source code — no partial code or diffs.

CHOOSING THE RIGHT TOOL — patchDashboard vs. updateDashboard
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BUG: the prompt now introduces patchDashboard, but the same prompt still later forces the old full-rewrite tool.

Walkthrough: this new section teaches the model how to choose between patchDashboard and updateDashboard, but the prompt still ends with You MUST call the updateDashboard tool with the complete source code. I reproduced the conflict by checking the create-dashboard prompt contains both patchDashboard guidance and that forced updateDashboard instruction.

Expected behavior: remove or update the later forced updateDashboard instruction so the strongest instruction matches this new tool-choice guidance.

const edits = parsePatchEdits(item.args);
if (!edits) continue;
const result = applyDashboardPatches(runningSource, edits);
runningSource = result.updatedSource;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BUG: patch application commits partial results when later edits fail.

Walkthrough: I reproduced this with two edits: the first replaces alpha with ALPHA, the second searches for missing text. The result reports a failure for edit 2, but updatedSource still contains edit 1. For AI-generated code edits, partial source is often worse than no source because the user sees a failed operation while the code was still changed.

Expected behavior: apply a patch batch against a draft source and only assign runningSource = draft if every edit succeeds.

onSave(row.correlationId, question, answer, publish);
Promise.resolve(onSave(row.id, question, answer, publish))
.catch(err => captureError("knowledge-base-save", err));
setEditingId(null);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BUG: failed saves look successful to the reviewer.

Walkthrough: the save starts Promise.resolve(onSave(...)).catch(...), then immediately closes the editor with setEditingId(null). I reproduced this from the component source: a rejected save is only sent to captureError, while the reviewer sees the editor disappear as if the save worked.

Expected behavior: await the save/delete action, keep the editor/dialog open on failure, and show an inline error. Capturing the error is useful for us, but it is not enough feedback for the reviewer performing the action.

{ name: "mark_human_reviewed", args: [wrong, "corr", "reviewer"] },
{ name: "unmark_human_reviewed", args: [wrong, "corr"] },
{
name: "update_human_correction",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BUG: this auth test still references a removed reducer.

Walkthrough: the current correction path uses the newer QA reducers, but this test still calls update_human_correction. I reproduced this statically by checking the test source still contains that reducer name. Against the real module this either fails for the wrong reason or no longer tests the current write surface.

Expected behavior: replace this with the current mutating reducer(s), especially the correction/upsert path that actually ships now.

args: [wrong, "corr", "q", "a", false, "reviewer"],
},
{ name: "add_manual_qa", args: [wrong, "q", "a", false, "reviewer"] },
{ name: "delete_qa_entry", args: [wrong, "corr"] },
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BUG: this does not prove delete_qa_entry enforces auth.

Walkthrough: delete_qa_entry expects a numeric/u64 QA id, but the test passes "corr". That can fail while decoding the reducer arguments before the token check is reached. So a failing reducer call here does not necessarily mean the wrong token was rejected.

Expected behavior: pass a valid numeric QA id shape with the wrong token, then assert the failure message is the invalid-token path.

expect(enroll.status).toBe(200);
const seedMarker = uniqueMarker("private-mcp-seed");
scope.trackMcpQuestion(seedMarker);
const seed = await niceBackendFetch("/api/latest/internal/mcp-review/add-manual", {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BUG: this seeds the wrong table for the private-table assertion.

Walkthrough: the test wants to prove a stranger cannot see mcp_call_log, but it seeds via /mcp-review/add-manual. Manual Q&A now writes qa_entries, not mcp_call_log. I reproduced this by checking the seed block calls add-manual and does not call log_mcp_call, so the privacy assertion can pass simply because mcp_call_log is empty.

Expected behavior: seed mcp_call_log directly, or first assert the private table contains the seeded row for an authorized caller before checking stranger visibility.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants