feat(risk): default to recent-N drain; opt-in full backfill#2889
Conversation
Previously every new chat message and every policy edit triggered a full backfill of every unanalyzed message for every enabled policy. That work scales with project history and is wasted effort when the operator only cares about ongoing scanning. Now ingest signals and policy create/update default to the most recent 100 unanalyzed messages per policy. Explicit backfill is opt-in via the existing Trigger endpoint, which accepts a new optional `limit` field (omit or 0 for unbounded). The Progress sheet exposes both "Backfill all messages" and "Backfill last N". The workflow gained a MaxMessages param and a typed signal payload so concurrent signals can escalate a bounded run to unbounded mid-flight (0 wins; larger positive wins over smaller). Pending signals at end of a cycle are folded into the next ContinueAsNew. FetchUnanalyzedMessageIDs now orders by cm.id DESC. uuidv7 is k-sortable, so the PK btree gives us "recent first" without a new index and Postgres can stop scanning early once LIMIT is satisfied.
There was a problem hiding this comment.
Claude Code Review
This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.
Tip: disable this comment in your organization's Code Review settings.
|
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
🚀 Preview Environment (PR #2889)Preview URL: https://pr-2889.dev.getgram.ai
Gram Preview Bot |
…aming Two related risk-policy polish items. 1. triggerRiskAnalysis: default `limit` to 100 (the recent-N drain budget). Callers can pass 0 explicitly to request a full backfill. The previous "omit = unbounded" was a footgun because the most common Progress-tab use case is "scan recent messages, not all history". 2. Auto-naming: stop leaking detection library names. Policy authors think in what is detected (secrets, PII, prompt injection) not how (gitleaks, presidio). The LLM prompt was passing raw source identifiers, so generated names regularly regurgitated the library name (e.g. "Block Gitleaks Secrets"). Translate sources to user facing category labels and tell the model not to mention internal tool names. Closes AGE-2378. Also: tighten the FetchUnanalyzedMessageIDs comment. The previous wording credited the primary-key btree for the recent-first ordering, but EXPLAIN ANALYZE shows the composite (project_id, id) index is what satisfies ORDER BY cm.id DESC via an Index Only Scan Backward (no Sort node, ~2ms for LIMIT 100 on 15k rows).
Each SignalNewMessages call round-trips to Temporal (~20ms). The observer fires on the chat-message hot path, and a project with N enabled risk policies was paying N × 20ms serially. Fan out the per-policy signals so total latency is the slowest signal, not the sum.
The dashboard Input wrapper passes (value: string) to onChange, not the raw React event. Caught by tsc in CI.
Goa's Default(100) on triggerRiskAnalysis.limit converts an omitted field into 100 on the server. The Backfill all messages button was calling onTrigger() with no argument, which serializes the field as undefined, so the server applied the default and the "all messages" button silently behaved as "last 100". Pass 0 explicitly to request the unbounded backfill.
Daniel caught this in PR review: SignalWithStart leaves the triggering signal in the new run's channel. Our end-of-cycle drain consumed it and treated it as "new work arrived", so a bounded run with MaxMessages=100 would ContinueAsNew forever with the same params and process ~Nx100 messages per click instead of exactly 100. Drain the channel at the top of the workflow so the start signal is absorbed into the initial budget. End-of-cycle drain only picks up signals that arrived *after* this run started — those are real "new work" requests that warrant ContinueAsNew with a refreshed budget. Added TestDrainWorkflow_StartSignalDoesNotSelfLoop, which fails on the pre-fix code (workflow exits via ContinueAsNew with only the start signal in the channel) and passes after this change. Replaced the older TestDrainWorkflow_EmptyDrainWaitsForSignal with TestDrainWorkflow_SignalDuringDrainContinuesAsNew which uses a >0 delay so the signal arrives after the start drain — modeling the real "signal landed mid-cycle" case the original test was trying to cover. Also trim the verbose comment paragraphs flagged in the review.
This comment has been minimized.
This comment has been minimized.
Each call to TriggerRiskAnalysis bumped the policy version. The analyzed-count display filters by current version, so a 'Backfill last 100' followed by 'Backfill last 400' would show 400 analyzed — not the expected 500 — because the first 100 results now sit under v=N-1 while the second batch writes to v=N. The version bump existed because Trigger used to be the only way to force re-analysis. Now that we have an explicit limit parameter (limit=0 means full backfill), the bump is the wrong default: it silently overwrites the user's mental model of accumulating backfills. Policy edits that change detection rules still bump the version via UpdateRiskPolicy's WHEN clause, which is when re-analysis is actually warranted. Renamed TestTriggerRiskAnalysis_BumpsVersion to ...DoesNotBumpVersion and extended it to cover two sequential triggers.
This reverts commit 338112b.
…test - Split the drain_risk_analysis const block so each group has matching typing (SA9004: only the first constant was explicitly typed). - Drive the mid-cycle signal from inside the fetch activity so it is reliably queued before the workflow's end-of-cycle drain check. The prior 1ms RegisterDelayedCallback raced with activity completion in the test scheduler and failed in CI.
|
@claude review once |
There was a problem hiding this comment.
Code review is billed via overage credits. To resume reviews, an organization admin can raise the monthly limit at claude.ai/admin-settings/claude-code.
Once credits are available, comment @claude review on this pull request to trigger a review.
|
Claude encountered an error after 0s —— View job I'll analyze this and get back to you. |
Closes AGE-2212 and AGE-2378.
Why
Every new chat message and every policy edit triggered a full drain of all unanalyzed messages for that policy. The work isn't redundant (the query already filters by
risk_policy_version), but it's expensive: each message runs through gitleaks + Presidio + optional prompt-injection scanning, and a single new message could kick off thousands of analyses on the hot path.AGE-2212 asks us to default the ingest path to "scan recent traffic, not all history". Operators who want a historical re-scan opt in explicitly from the Progress tab.
What changed
OnMessagesStored),CreateRiskPolicy, andUpdateRiskPolicycap their drain at the most recent 100 unanalyzed messages (DefaultRecentMessagesBudget).TriggerRiskAnalysisaccepts an optionallimit(defaults to 100). Pass0to backfill everything. The Progress sheet has bothBackfill all messages(sends0) andBackfill last Nwith a number input. Each click still bumps the policy version (re-analysis semantic preserved); successive bounded clicks therefore replace rather than accumulate, which matches "trigger means re-do".DrainRiskAnalysisParams.MaxMessages+SignalNewMessagesPayload.MaxMessages. AMaxMessages>0run fetches up to that many and stops;0drains to empty. Concurrent signals during a bounded run escalate the next cycle (0wins; larger positive wins over smaller).ContinueAsNewed forever with the same params. Covered byTestDrainWorkflow_StartSignalDoesNotSelfLoop(fails on the pre-fix code, passes with the fix) andTestDrainWorkflow_SignalDuringDrainContinuesAsNew(mid-cycle escalation still works).OnMessagesStoredfans out per-policy signals so the chat hot path isn'tN × 20ms.FetchUnanalyzedMessageIDsorders bycm.id DESC. uuidv7 is k-sortable so the existing composite(project_id, id)index satisfies the order via Index Only Scan Backward — no Sort node, no new index.EXPLAIN ANALYZE: LIMIT 100 over 15k rows runs in ~2ms.generatePolicyNametranslates sources to user-facing category labels before prompting the model, and the prompt instructs the model not to mention internal tool or library names.fallbackPolicyNameuses the same mapping.Demo
Default behavior + full backfill on a fresh policy
A new
Demo Secrets Onlypolicy is created via the API. The create signal fires the default recent-N drain so the Progress sheet opens at exactly 100 of 15,300 analyzed. ClickingBackfill all messagesthen drains to completion — counter climbs all the way to 15,300 / 15,300.https://github.com/speakeasy-api/gram/releases/download/_gh-attach-assets/recording2-y4vhbb.mp4
Bounded explicit backfill
Two clicks of
Backfill last Non a fresh policy, first at N=100 then N=400. Trigger bumps the policy version on each call, so the second click re-scans under a new version — final state settles at exactly 400 of 15,300 at version 3, not 15,300. Demonstrates the bound is honored.https://github.com/speakeasy-api/gram/releases/download/_gh-attach-assets/recording-gkptb5.mp4
Test plan
go test ./internal/background/... ./internal/risk/...mise lint:server,cd client/dashboard && pnpm lintEXPLAIN ANALYZEconfirms Index Only Scan Backward on the composite index for both LIMIT 100 and LIMIT 20000