You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add an LSP-grounded verification step to the pr-review deep/audit tiers that cross-checks each review finding against real codebase semantics (lsp_get_diagnostics, lsp_find_references) before posting. Research shows 58–62% of static-analysis false positives are rejectable via semantic grounding, and 30–42% of AI code review false positives stem from misread context. This directly attacks the trust erosion that auto-approving agents face when they post confident-but-wrong findings.
Market Signal
The "Verify Before You Fix" framework (arXiv:2604.10800, April 2026) demonstrates that execution-grounded validation rejects 58–62% of detector false positives while confirming 67–71% of genuine issues. Disabling grounded validation increases unnecessary repairs by 131.7%. SonarQube launched its Agentic Analysis MCP Server (March 2026 beta) specifically for in-loop code verification. The industry consensus is shifting from "generate then review" to "generate, verify semantically, then review."
User Signal
Discussion #578 specifically calls out using LSP as a "grounding/verification step" in deep/audit tiers. The existing prompts/deep-review.md and prompts/security-audit.md tiers instruct the agent to verify findings, but without semantic tools the agent relies on grep — which returns textual matches including false hits in comments and strings. The cross-engine rubber duck (run_duck) is an adversarial check on reasoning, but not on codebase facts. Bug #574 (>300 changed files hard-fail) shows the pipeline struggles with large PRs where grep-based navigation is most unreliable.
Technical Opportunity
The verification hook slots naturally between the deep review output and the cascade-action synthesis in review-one-pr.sh. Architecture:
New: LSP verification step calls find_references/get_diagnostics on each cited symbol
Findings confirmed by LSP are marked "verified"; unconfirmed ones are flagged as "ungrounded" (not suppressed — preserving recall)
Cascade-action synthesizes only verified findings with higher confidence
This requires no prompt rewrite — just an intermediate verification pass using the same LSP runtime that Discussion #578 proposes. The verification agent uses the existing MCP tool access from engine.sh.
Assessment
Dimension
Score
Rationale
Feasibility
med
Requires LSP integration (Discussion #578) as a prerequisite; the verification prompt + scoring logic is new work
Impact
high
Directly reduces false positives — the #1 trust erosion vector for auto-approving agents
Urgency
med
Depends on LSP pilot landing first; design work can proceed in parallel
Adversarial Review
Strongest objection: If the LSP server returns false negatives (fails to find a real issue because the language server does not cover the language or the issue is purely semantic/logical), we could suppress valid findings and miss real bugs.
Rebuttal: The design is additive: unverified findings are flagged as "ungrounded" rather than suppressed, preserving recall while boosting precision. The human reviewer sees both verified and ungrounded findings, with confidence scores. Language coverage gaps are bounded — LSP tools gracefully return empty results for unsupported languages, which triggers the "ungrounded" flag rather than a false negative.
Suggested Next Step
Design the verification prompt that takes deep-review findings + LSP tool access and outputs a confidence-scored finding list. Prototype on 10 recent PRs (5 with known false positives) and measure precision/recall delta. Integrate with the existing cascade-action.md synthesis step.
🤖 Proposed by Mary (BMAD Strategic Business Analyst) · companion to Discussion #578
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
Summary
Add an LSP-grounded verification step to the pr-review deep/audit tiers that cross-checks each review finding against real codebase semantics (
lsp_get_diagnostics,lsp_find_references) before posting. Research shows 58–62% of static-analysis false positives are rejectable via semantic grounding, and 30–42% of AI code review false positives stem from misread context. This directly attacks the trust erosion that auto-approving agents face when they post confident-but-wrong findings.Market Signal
The "Verify Before You Fix" framework (arXiv:2604.10800, April 2026) demonstrates that execution-grounded validation rejects 58–62% of detector false positives while confirming 67–71% of genuine issues. Disabling grounded validation increases unnecessary repairs by 131.7%. SonarQube launched its Agentic Analysis MCP Server (March 2026 beta) specifically for in-loop code verification. The industry consensus is shifting from "generate then review" to "generate, verify semantically, then review."
User Signal
Discussion #578 specifically calls out using LSP as a "grounding/verification step" in deep/audit tiers. The existing
prompts/deep-review.mdandprompts/security-audit.mdtiers instruct the agent to verify findings, but without semantic tools the agent relies ongrep— which returns textual matches including false hits in comments and strings. The cross-engine rubber duck (run_duck) is an adversarial check on reasoning, but not on codebase facts. Bug #574 (>300 changed files hard-fail) shows the pipeline struggles with large PRs where grep-based navigation is most unreliable.Technical Opportunity
The verification hook slots naturally between the deep review output and the cascade-action synthesis in
review-one-pr.sh. Architecture:find_references/get_diagnosticson each cited symbolThis requires no prompt rewrite — just an intermediate verification pass using the same LSP runtime that Discussion #578 proposes. The verification agent uses the existing MCP tool access from
engine.sh.Assessment
Adversarial Review
Strongest objection: If the LSP server returns false negatives (fails to find a real issue because the language server does not cover the language or the issue is purely semantic/logical), we could suppress valid findings and miss real bugs.
Rebuttal: The design is additive: unverified findings are flagged as "ungrounded" rather than suppressed, preserving recall while boosting precision. The human reviewer sees both verified and ungrounded findings, with confidence scores. Language coverage gaps are bounded — LSP tools gracefully return empty results for unsupported languages, which triggers the "ungrounded" flag rather than a false negative.
Suggested Next Step
Design the verification prompt that takes deep-review findings + LSP tool access and outputs a confidence-scored finding list. Prototype on 10 recent PRs (5 with known false positives) and measure precision/recall delta. Integrate with the existing
cascade-action.mdsynthesis step.🤖 Proposed by Mary (BMAD Strategic Business Analyst) · companion to Discussion #578
Beta Was this translation helpful? Give feedback.
All reactions