💡 Token Cost Observatory LSP Metrics: Measuring the ROI of Semantic Code Intelligence #596

2026-06-11T23:30:16Z

github-actions[bot]
Bot Jun 11, 2026

Summary

Extend the Token Cost Observatory (token-metrics.sh, token_report.sh) to capture LSP-specific metrics: tool-call counts by type (LSP navigation vs grep/read), tokens consumed per navigation approach, and a "grounding ratio" (findings verified by LSP / total findings). This provides the data needed for a go/no-go decision on fleet-wide LSP rollout and ongoing cost optimization — ensuring the LSP pilot is measured, not just deployed.

Market Signal

ManoMano's Project AEGIS benchmark showed Serena-equipped agents used 4 subagents vs 12 for vanilla Claude on the same task — but actual token costs were similar ($27.30 vs $23.54). The nuance matters: raw token count alone does not capture quality. A cheaper run that fails is worse than a slightly more expensive run that succeeds. The industry is moving toward quality-adjusted cost metrics (tokens per correct finding, cost per verified fix) rather than raw token counts. MCPBench and MCPAgentBench both evaluate tool-use efficiency, not just task success — the benchmarking discipline is maturing.

User Signal

Discussion #578 explicitly recommends measuring "tokens/run, tool-call count, review precision — pilot vs control." The Token Cost Observatory (#332, #464) already captures per-call JSONL with workflow/tier/model/input/output/cache tokens. The ET (Effective Token) formula weights output 4x input — LSP's benefit is reducing output (fewer false findings to write) and reducing input (targeted navigation vs loading whole files into context).

Technical Opportunity

token-metrics.sh's emit_token_record() already logs per-call JSONL. The proposed extensions:

mcp_tool_calls — optional JSON object mapping tool names to call counts (e.g., {"lsp_find_references": 4, "grep": 2})
lsp_enabled — boolean flag indicating whether the run had MCP/LSP tools available
grounding_ratio — float (verified findings / total findings), computed by the LSP verification step (Discussion #594)

token_report.sh's render_* functions gain a new section: "LSP Efficiency" comparing:

LSP-enabled vs LSP-disabled runs (same tier, same model)
Tool-call distribution (navigation tools vs raw file reads)
ET per verified finding

The existing per-repo breakdown in token_report.sh already supports pilot-vs-control comparison: run the pilot on one repo (.github-private) while the fleet continues without LSP, then compare the Observatory reports side-by-side.

Assessment

Dimension	Score	Rationale
Feasibility	high	Extends existing JSONL schema + rendering functions with optional fields
Impact	med	Enables data-driven go/no-go for fleet-wide LSP; prevents rollout based on vendor narratives alone
Urgency	med	Should be ready before the LSP pilot begins so day-1 data is captured

Adversarial Review

Strongest objection: Adding per-tool-call metrics increases JSONL artifact size and report complexity. The grounding ratio requires parsing agent output to count findings, which is fragile. Comparing LSP vs non-LSP runs requires A/B capability that does not exist in the current pipeline.

Rebuttal: JSONL growth is minimal (one optional JSON object field per record — ~100 bytes). The grounding ratio is computed from the verification step's structured output, not from parsing free-text agent responses. A/B comparison requires no new infrastructure: run the pilot on one repo while the fleet continues without LSP, then compare the Observatory's per-repo breakdown. The existing annotate_records() function can filter by lsp_enabled to split the data.

Suggested Next Step

Add optional mcp_tool_calls and lsp_enabled fields to emit_token_record() in token-metrics.sh. Update token_report.sh to render an "MCP Tool Usage" section when MCP data is present. Define the grounding ratio metric schema and add it to the weekly report template. Target: metrics infrastructure ready before LSP pilot begins.

_{🤖 Proposed by Mary (BMAD Strategic Business Analyst) · companion to Discussion #578}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

💡 Token Cost Observatory LSP Metrics: Measuring the ROI of Semantic Code Intelligence #596

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

💡 Token Cost Observatory LSP Metrics: Measuring the ROI of Semantic Code Intelligence #596

Uh oh!

github-actions[bot] Bot Jun 11, 2026

Summary

Market Signal

User Signal

Technical Opportunity

Assessment

Adversarial Review

Suggested Next Step

Replies: 0 comments

github-actions[bot]
Bot Jun 11, 2026