Skip to content

feat: show effective-token delta per MCP tool call in agent log#33629

Merged
pelikhan merged 2 commits into
mainfrom
copilot/improve-agent-log-rendering
May 20, 2026
Merged

feat: show effective-token delta per MCP tool call in agent log#33629
pelikhan merged 2 commits into
mainfrom
copilot/improve-agent-log-rendering

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented May 20, 2026

Each MCP tool call result grows the LLM context window by some amount, but the exact cost was invisible. This adds a ΔET (effective-token delta) annotation to each tool call by correlating rpc-messages.jsonl timestamps with token-usage.jsonl API call data.

Algorithm

For each tool call at timestamp T:

  • prev = last token-usage.jsonl entry with ts < T (the API call that made the tool decision)
  • next = first entry with ts > T (the API call that consumed the result)
  • ΔET = effectiveTokens(next) − effectiveTokens(prev)

Changes

Go CLI (pkg/cli/)

  • audit_report.go: adds EffectiveTokenDelta int to MCPToolCall
  • token_usage.go: adds readTokenUsageEntries (raw sorted JSONL reader) and correlateToolCallsWithTokenDelta (correlation using existing ET weights/multipliers)
  • gateway_logs_mcp.go: calls correlation after building tool calls for both rpc-messages.jsonl and gateway.jsonl paths
  • audit_report_render_tools.go: adds a Tool Call Timeline (Effective Token Δ) table when deltas are available

JS step summary (actions/setup/js/parse_mcp_gateway_log.cjs)

  • Adds computeToolCallTokenDeltas(tokenUsageContent, requests) using the existing computeEffectiveTokens from effective_tokens.cjs
  • Adds optional tokenDeltas parameter to generateRpcMessagesSummary; renders a ΔET column in the REQUEST table when present
  • main() reads TOKEN_USAGE_PATH, computes deltas, passes them into the summary — non-fatal if file is missing

Example output (REQUEST table in step summary)

Time Server Tool / Method ΔET
2026-05-19 21:10:54Z safeoutputs push_to_pull_request_branch +622
2026-05-19 21:10:57Z safeoutputs update_pull_request +558
2026-05-19 21:11:00Z safeoutputs update_pull_request +694

Correlates rpc-messages.jsonl timestamps with token-usage.jsonl to compute
the effective-token delta introduced by each MCP tool call result.

Go CLI: adds a "Tool Call Timeline (Effective Token Δ)" table to the MCP
tool usage section when delta data is available.

JS step summary: adds a ΔET column to the REQUEST table in the MCP Gateway
Activity section.

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copilot AI changed the title feat: display effective-token delta per tool call in agent log feat: show effective-token delta per MCP tool call in agent log May 20, 2026
Copilot AI requested a review from pelikhan May 20, 2026 21:03
@pelikhan pelikhan marked this pull request as ready for review May 20, 2026 22:26
Copilot AI review requested due to automatic review settings May 20, 2026 22:26
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds visibility into how much each MCP tool call result increases the LLM context by annotating tool calls with an effective-token delta (ΔET), computed by correlating tool-call timestamps with adjacent LLM API calls in token-usage.jsonl.

Changes:

  • Go CLI: parse/sort raw token usage entries and correlate tool calls with ΔET; store it on MCPToolCall.
  • Go CLI: render a “Tool Call Timeline (Effective Token Δ)” table when delta data exists.
  • GitHub Actions JS summary: compute per-request ΔET and optionally add a ΔET column to the REQUEST table; adds unit tests.
Show a summary per file
File Description
pkg/cli/token_usage.go Adds token-usage JSONL reader and correlates MCP tool calls with effective-token deltas.
pkg/cli/token_usage_test.go Adds unit tests for the Go correlation logic.
pkg/cli/gateway_logs_mcp.go Invokes tool-call/token-usage correlation when building MCP tool call records.
pkg/cli/audit_report.go Extends MCPToolCall with EffectiveTokenDelta for JSON/reporting.
pkg/cli/audit_report_render_tools.go Renders a tool-call timeline table including ΔET when available.
actions/setup/js/parse_mcp_gateway_log.cjs Computes per-request ΔET from token-usage.jsonl and renders an optional ΔET column in the REQUEST table.
actions/setup/js/parse_mcp_gateway_log.test.cjs Adds unit tests for JS ΔET computation and rendering.
.changeset/patch-agent-log-token-delta.md Declares a patch release note describing the new ΔET annotations.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comments suppressed due to low confidence (1)

pkg/cli/token_usage.go:556

  • The prev/next search scans the entire etEntries slice for every tool call (O(toolCalls × tokenUsageEntries)) even though etEntries is sorted. This can become slow on large runs. Consider using sort.Search (binary search) to find the insertion point for callTS and derive prev/next indices, or a two-pointer walk if toolCalls are chronological, and break early once nextIdx is found.
		// Find prev (last entry with ts < callTS) and next (first entry with ts > callTS)
		prevIdx := -1
		nextIdx := -1
		for j, e := range etEntries {
			if e.ts.Before(callTS) {
				prevIdx = j // keep updating to get the last one before callTS
			} else if e.ts.After(callTS) && nextIdx == -1 {
				nextIdx = j // first entry after callTS
			}
		}
  • Files reviewed: 8/8 changed files
  • Comments generated: 4

Comment thread pkg/cli/token_usage.go
Comment on lines +514 to +515
// Resolve weights once for all entries
multipliers, classWeights := resolveEffectiveWeights(nil)
Comment on lines +81 to +83
// Correlate tool calls with effective-token deltas from token-usage.jsonl
tokenUsageFile := findTokenUsageFile(logDir)
toolCalls = correlateToolCallsWithTokenDelta(toolCalls, tokenUsageFile)
Comment on lines +142 to +145
ts := tc.Timestamp
if len(ts) > 19 {
ts = ts[:19] + "Z"
}
Comment on lines +619 to +633
for (let i = 0; i < requests.length; i++) {
const req = requests[i];
if (!req.timestamp) continue;
const callTs = new Date(req.timestamp).getTime();
if (isNaN(callTs)) continue;

let prevIdx = -1;
let nextIdx = -1;
for (let j = 0; j < etEntries.length; j++) {
if (etEntries[j].ts < callTs) {
prevIdx = j; // keep updating to get the last one before callTs
} else if (etEntries[j].ts > callTs && nextIdx === -1) {
nextIdx = j; // first entry after callTs
}
}
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 20, 2026

🧠 Matt Pocock Skills Reviewer has completed the skills-based review. ✅

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 20, 2026

🧪 Test Quality Sentinel completed test quality analysis.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 20, 2026

PR Code Quality Reviewer completed the code quality review.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 20, 2026

Design Decision Gate 🏗️ completed the design decision gate check.

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Quality Review Summary

This PR adds effective-token delta tracking for MCP tool calls, which is a valuable observability improvement. However, there are 4 issues that should be addressed before merging:

Issues Found

  1. Performance inefficiency in JavaScript (line 633, parse_mcp_gateway_log.cjs) — O(n2) nested loop can be optimized to O(n log n) using binary search
  2. Inconsistent token weight handling (lines 515 & 547, token_usage.go) — Ignores custom weights from aw_info.json, causing deltas to be inconsistent with other parts of the report
  3. Redundant file operations (line 83, gateway_logs_mcp.go) — findTokenUsageFile called twice instead of once
  4. Timestamp rendering issue (line 145, audit_report_render_tools.go) — String slicing can misrepresent non-UTC timestamps and produce invalid output

All four issues were previously identified by the automated reviewer and remain valid concerns.

What Works Well

✅ Comprehensive test coverage for both Go and JavaScript implementations
✅ Clean separation of concerns (correlation logic isolated in dedicated functions)
✅ Proper error handling (non-fatal when token-usage.jsonl is missing)
✅ Clear algorithm documentation in both code and PR description

Recommendation

Address the 4 flagged issues, particularly the custom weights inconsistency (#2) which affects correctness, and the performance issue (#1) which could be noticeable on runs with many tool calls.

🔎 Code quality review by PR Code Quality Reviewer · ● 3.4M

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skills-Based Review 🧠

Applied /diagnose and /tdd based on the feature addition with algorithmic implementation and test coverage.

Key Themes

Performance bottleneck — Both JS and Go implementations use O(n·m) nested loops instead of binary search on already-sorted data. For audit logs with hundreds of tool calls and API entries, this will cause noticeable performance degradation. Since both implementations already sort their data arrays, switching to binary search (O(log m) per lookup) is straightforward and drops overall complexity from O(n·m) to O(n·log m).

Missing edge case tests — The test suites cover happy paths well but don't document behavior for:

  • Tool call timestamp exactly equal to an API call timestamp (current logic uses strict < / >, so exact matches return delta=0)
  • Negative deltas when effective tokens decrease (cache-heavy follow-ups can reduce billable tokens)

These edge cases should have explicit tests to prevent future regressions and clarify intended behavior.

Positive Highlights

Algorithm documentation — Both implementations have clear docstrings explaining the prev/next correlation logic
Graceful degradation — Missing token-usage.jsonl is non-fatal; delta column is simply omitted
Consistent cross-language implementation — JS and Go versions use identical logic, making maintenance easier
Good test coverage for sequential cases — The multi-call test scenarios validate correct delta assignment

Verdict

Requesting changes to address the performance issue before merge. The binary search optimization is a straightforward improvement that prevents future performance complaints. Edge case tests are recommended but not blocking.

🧠 Reviewed using Matt Pocock's skills by Matt Pocock Skills Reviewer · ● 5.9M

Comments that could not be inline-anchored

actions/setup/js/parse_mcp_gateway_log.cjs:75

[/diagnose] This nested loop creates O(n·m) performance where n = requests.length and m = etEntries.length. For workflows with hundreds of tool calls and API entries, this will cause noticeable lag.

Optimization: Since etEntries is already sorted by timestamp (line 64), use binary search to find prevIdx and nextIdx in O(log m) instead of O(m):

for (let i = 0; i &lt; requests.length; i++) {
  const req = requests[i];
  if (!req.timestamp) continue;
  const callTs = new…

</details>

<details><summary>pkg/cli/token_usage.go:513</summary>

**[/diagnose]** Same O(n·m) performance issue as the JavaScript implementation. The Go version scans all `etEntries` for each tool call.

**Optimization**: Use `sort.Search` for binary search since `etEntries` is already sorted:

```go
for i, tc := range updated {
    callTS, ok := parseTokenUsageTimestamp(tc.Timestamp)
    if !ok {
        continue
    }

    // Binary search for first entry after callTS
    nextIdx := sort.Search(len(etEntries), func(j int) bool {
        return etEntries[j].

</details>

<details><summary>actions/setup/js/parse_mcp_gateway_log.test.cjs:198</summary>

**[/tdd]** Missing edge case tests for timestamp boundary conditions. Consider adding:

```javascript
test(&quot;handles tool call with timestamp equal to API call&quot;, () =&gt; {
  // What happens when callTs === etEntries[i].ts? 
  // Current logic uses &lt; and &gt;, so exact match won&#39;t find prev or next
  const tokenContent = [
    tuLine(&quot;2026-05-19T21:10:00.000Z&quot;, 1000, 50),
    tuLine(&quot;2026-05-19T21:10:05.000Z&quot;, 1500, 80) // exact match
  ].join(&quot;\\n&quot;);
  const requests = [req(&quot;2026-05-19T21:10:05.000Z&quot;…

</details>

<details><summary>pkg/cli/token_usage_test.go:544</summary>

**[/tdd]** The Go tests cover happy path well, but missing the same edge cases as the JS tests. Add:

```go
t.Run(&quot;tool call timestamp equal to API call timestamp&quot;, func(t *testing.T) {
    tmpDir := testutil.TempDir(t, &quot;token-delta-exact&quot;)
    filePath := filepath.Join(tmpDir, &quot;token-usage.jsonl&quot;)
    // Tool call at exactly the same timestamp as an API call
    content := `{&quot;timestamp&quot;:&quot;2026-05-19T21:10:00.000Z&quot;,&quot;model&quot;:&quot;unknown&quot;,&quot;provider&quot;:&quot;test&quot;,&quot;input_tokens&quot;:1000,&quot;output_tokens&quot;:50,&quot;cache…

</details>

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 20, 2026

PR Code Quality Reviewer completed the code quality review.

No new issues found beyond those already identified by previous reviewer. All 4 existing review comments are valid and should be addressed: (1) custom token weights not propagated to correlateToolCallsWithTokenDelta, (2) duplicate findTokenUsageFile calls, (3) unsafe timestamp string truncation, (4) O(n2) nested loop in JS that could use binary search.

… call

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions

This comment has been minimized.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 20, 2026

🧠 Matt Pocock Skills Reviewer has completed the skills-based review. ✅

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 20, 2026

Design Decision Gate 🏗️ completed the design decision gate check.

@github-actions

This comment has been minimized.

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Test Quality Sentinel: 100/100. Test quality is excellent — 0% of new tests are implementation tests (threshold: 30%). All 13 tests verify behavioral contracts with comprehensive edge case coverage.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 20, 2026

🧪 Test Quality Sentinel completed test quality analysis.

@github-actions
Copy link
Copy Markdown
Contributor

✅ Design Decision Gate — ADR Verified

The implementation in this PR aligns with the linked Architecture Decision Record.

ADR reviewed: ADR-33629: Effective-Token Delta per MCP Tool Call via Timestamp Correlation

Verification Summary

The ADR specifies a timestamp-bracketing algorithm (ΔET = effectiveTokens(next) − effectiveTokens(prev)) computed independently in Go and JS, reusing existing effective-token weight routines, with graceful degradation when token-usage.jsonl is absent. The diff faithfully implements every MUST requirement from the normative section:

Per-requirement conformance check

Delta Computation

  • MUST bracket each tool call by prev (last ts < T) and next (first ts > T) — implemented in actions/setup/js/parse_mcp_gateway_log.cjs and pkg/cli/token_usage.go.
  • MUST reuse computeEffectiveTokens (JS) / computeModelEffectiveTokensWithWeights (Go) — confirmed in both correlation routines.
  • MUST NOT assign delta when prev or next is missing — both implementations guard with if (prevIdx === -1 || nextIdx === -1) continue (JS) and equivalent Go check.
  • MUST NOT render ΔET ≤ 0if (delta > 0) / if delta > 0 gates on both sides.
  • MUST be silent and non-fatal on missing/malformed token-usage.jsonl — JS wraps readFileSync in try/catch, Go logs and returns input unchanged.

Rendering

  • MUST render a separate Tool Call Timeline (Effective Token Δ) table in Go CLI — pkg/cli/audit_report_render_tools.go.
  • MUST render ΔET column in JS REQUEST table only when at least one positive delta exists — parse_mcp_gateway_log.cjs gates on hasDeltas.
  • MUST display - for requests without a delta when column is present — JS renders delta ? +N : -.
  • SHOULD format with leading + and thousands separators — both implementations use + prefix and locale/console formatting.

Data Model

  • MUST add EffectiveTokenDelta int with effective_token_delta,omitemptypkg/cli/audit_report.go.
  • MUST return new slice from correlateToolCallsWithTokenDelta (no input mutation) — token_usage.go uses make + copy.
  • MUST invoke correlation for both rpc-messages.jsonl and gateway.jsonl paths — pkg/cli/gateway_logs_mcp.go calls it on both branches.
  • MUST return a Map keyed by request index, omitting ≤ 0 entries — computeToolCallTokenDeltas in parse_mcp_gateway_log.cjs.

Unit tests cover the bracketing algorithm in both languages (TestCorrelateToolCallsWithTokenDelta in pkg/cli/token_usage_test.go, computeToolCallTokenDeltas describe block in parse_mcp_gateway_log.test.cjs), including the no-prev, no-next, multi-call, and empty-input cases called out in the ADR's Consequences.

The design decision has been recorded and the implementation follows it. 🏗️

References: §26193570741

🏗️ ADR gate enforced by Design Decision Gate 🏗️ · ● 3.5M ·

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skills-Based Review 🧠

Applied /tdd and /zoom-out based on this being a new feature that adds observable behavior (rendering effective-token deltas in agent logs).

Key Themes

Test Coverage (✅ Strong)

  • Both Go and JS implementations have comprehensive unit tests covering edge cases
  • Tests follow table-driven patterns with descriptive names that read as specifications
  • Edge cases covered: no-prev, no-next, empty inputs, multi-call scenarios

Architecture Fit (✅ Good)

  • Timestamp-bracketing algorithm is well-documented and reasonable for the use case
  • Graceful degradation when token-usage.jsonl is missing (non-fatal, additive feature)
  • Reuses existing effective-token computation logic (good DRY principle)
  • Excellent ADR documenting alternatives considered and trade-offs

Areas for Minor Improvement

  1. Algorithm Performance — O(n*m) complexity in the nested loop (lines 604-609 in Go, 75-80 in JS). Could be optimized to O(n log m) with binary search if needed for large logs.
  2. Missing Edge Case Tests — Clock skew / out-of-order timestamps and concurrent tool calls at identical timestamps not explicitly tested.
  3. Code Duplication — Bracketing algorithm is duplicated in Go and JS with slight implementation differences (acknowledged in ADR as acceptable trade-off).

Positive Highlights

  • Outstanding ADR — Clear narrative explaining context, decision, alternatives, and consequences. The normative specification section is excellent.
  • Defensive immutability — Go implementation creates a new slice rather than mutating input (line 593).
  • Non-fatal failure mode — Missing token-usage.jsonl doesn't break the feature, just omits the delta column.
  • Clear documentation — JSDoc and Go comments explain the algorithm well.
  • Consistent weight computation — Both implementations reuse existing effective-token logic for consistency.

Verdict

COMMENT — No blocking issues. The feature is well-tested, fits the architecture cleanly, and includes excellent documentation. The inline comments suggest optional future optimizations and additional edge case tests, but none are critical for launch.

Great work on the ADR — it's a model example of documenting design decisions! 🎯

🧠 Reviewed using Matt Pocock's skills by Matt Pocock Skills Reviewer · ● 6.6M

Comments that could not be inline-anchored

pkg/cli/token_usage.go:609

[/zoom-out] This nested loop creates O(n*m) complexity where n = tool calls and m = token usage entries. For typical runs this is fine, but consider optimizing if you observe performance issues with large logs.

Optimization approach (optional future work): Since etEntries is already sorted, you could use binary search to find prevIdx and nextIdx in O(log m) time, reducing overall complexity to O(n log m).

// Future optimization example:
prevIdx := sort.Search(len(etEntries</details>

<details><summary>actions/setup/js/parse_mcp_gateway_log.cjs:79</summary>

**[/zoom-out]** Same O(n*m) complexity issue as the Go implementation. The nested loop searches all `etEntries` for each request.

Consider using binary search here too for consistency and performance:

```js
// Binary search for prevIdx
let left = 0, right = etEntries.length - 1;
while (left &lt;= right) {
  const mid = Math.floor((left + right) / 2);
  if (etEntries[mid].ts &lt; callTs) {
    prevIdx = mid;
    left = mid + 1;
  } else {
    right = mid - 1;
  }
}

Not blocking for initial impl…

docs/adr/33629-effective-token-delta-per-mcp-tool-call.md:341

[/tdd] Excellent acknowledgment of the approximation in your ADR! Consider adding a test case that demonstrates this limitation:

func TestToolCallsDeltaWithClockSkew(t *testing.T) {
    // Tool call timestamp is AFTER the &quot;next&quot; API call due to clock skew
    // Expected: delta should be 0 (graceful degradation)
}

This would document the behavior when timestamps are out-of-order and prevent future regressions if the algorithm changes.

pkg/cli/token_usage.go:554

[/zoom-out] The function creates a new slice copy of toolCalls (line 593) rather than mutating the input. This is excellent defensive programming!

Consider documenting this choice in a comment so future maintainers understand it's intentional:

// Create a new slice to avoid mutating the input (caller may rely on original)
updated := make([]MCPToolCall, len(toolCalls))
copy(updated, toolCalls)

This makes the "returns a new slice (not mutate the input)" promise from the ADR mor…

actions/setup/js/parse_mcp_gateway_log.test.cjs:198

[/tdd] Great test coverage! The test names are descriptive and read like specifications.

One additional edge case worth testing: concurrent tool calls at the same timestamp. What happens if two tool calls have identical timestamps?

test(&quot;handles tool calls with identical timestamps&quot;, () =&gt; {
  const requests = [
    req(&quot;2026-05-19T21:10:05.000Z&quot;, &quot;tool-a&quot;),
    req(&quot;2026-05-19T21:10:05.000Z&quot;, &quot;tool-b&quot;), // same timestamp
  ];
  // Both should get the same delta or both should g…

</details>

@github-actions
Copy link
Copy Markdown
Contributor

🧪 Test Quality Sentinel Report

Test Quality Score: 93/100

Excellent test quality

Metric Value
New/modified tests analyzed 13
✅ Design tests (behavioral contracts) 13 (100%)
⚠️ Implementation tests (low value) 0 (0%)
Tests with error/edge cases 10 (77%)
Duplicate test clusters 0
Test inflation detected No
🚨 Coding-guideline violations 0

Test Classification Details

View detailed test breakdown (13 tests)

JavaScript: actions/setup/js/parse_mcp_gateway_log.test.cjs (8 tests)

Test Classification Issues Detected
computes delta for a tool call bracketed by two API calls ✅ Design None
returns empty map when tool call has no preceding API call ✅ Design None
returns empty map when tool call has no following API call ✅ Design None
returns empty map for empty inputs ✅ Design None
computes correct deltas for multiple sequential tool calls ✅ Design None
shows ΔET column when deltas are provided ✅ Design None
omits ΔET column when no deltas are provided ✅ Design None
shows dash for requests without a delta ✅ Design None

Go: pkg/cli/token_usage_test.go (5 subtests)

Test Classification Issues Detected
assigns delta to tool calls bracketed by API calls ✅ Design None
leaves delta zero when tool call has no preceding API call ✅ Design None
leaves delta zero when tool call has no following API call ✅ Design None
handles empty token usage file path ✅ Design None
assigns correct deltas to multiple sequential tool calls ✅ Design None

Analysis Highlights

✅ Strengths:

  • 100% behavioral coverage — All tests verify observable outputs and system contracts
  • Excellent edge case handling — Tests cover empty inputs, missing data, and boundary conditions
  • No test inflation — Test-to-production ratio is healthy (JS: 1:1, Go: 0.65:1)
  • Clean code practices — No prohibited mocks, proper build tags, descriptive assertion messages
  • Parallel test design — JavaScript and Go tests validate the same algorithm independently

Test Design Pattern Analysis:

Both test suites follow the same behavioral contract:

  1. Core algorithm correctness — Delta calculation between two API calls
  2. Edge case: Missing previous call — Returns zero/empty when no preceding API call exists
  3. Edge case: Missing next call — Returns zero/empty when no following API call exists
  4. Edge case: Empty inputs — Handles null/empty data gracefully
  5. Multiple calls — Correctly processes sequential tool calls with different deltas

This parallelism across languages strengthens confidence in the algorithm's correctness.


Test Inflation Analysis

File Test Lines Production Lines Ratio Status
parse_mcp_gateway_log.test.cjs +105 +105 1.00:1 ✅ Healthy
token_usage_test.go +92 +141 0.65:1 ✅ Healthy

Threshold: 2:1 (test lines should not exceed 2× production lines)


Language Support

Tests analyzed:

  • 🟨 JavaScript (*.test.cjs): 8 tests (vitest)
  • 🐹 Go (*_test.go): 5 subtests (testify, unit build tag)

Verdict

Check passed. 0% of new tests are implementation tests (threshold: 30%). All tests verify behavioral contracts and would catch regressions if deleted.


Minor Suggestion

The JavaScript tests could benefit from error path testing if computeToolCallTokenDeltas can throw exceptions for malformed input (e.g., invalid JSON in tokenContent). Current tests focus on edge cases (empty/missing data) but don't validate error handling for corrupt data.

However, this is a minor enhancement opportunity — the current test quality is excellent.


📖 Understanding Test Classifications

Design Tests (High Value) verify what the system does:

  • Assert on observable outputs, return values, or state changes
  • Cover error paths and boundary conditions
  • Would catch a behavioral regression if deleted
  • Remain valid even after internal refactoring

Implementation Tests (Low Value) verify how the system does it:

  • Assert on internal function calls (mocking internals)
  • Only test the happy path with typical inputs
  • Break during legitimate refactoring even when behavior is correct
  • Give false assurance: they pass even when the system is wrong

Goal: Shift toward tests that describe the system's behavioral contract — the promises it makes to its users and collaborators.

🧪 Test quality analysis by Test Quality Sentinel · ● 12.9M ·

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Test Quality Sentinel: 93/100. Test quality is excellent — 0% of new tests are implementation tests (threshold: 30%). All tests verify behavioral contracts and would catch regressions if deleted.

@github-actions
Copy link
Copy Markdown
Contributor

pr-sous-chef: nudged

@copilot review all comments

Please address unresolved review feedback and post a short completion plan when done. Also consider refreshing the branch and re-running checks.

Generated by 👨‍🍳 PR Sous Chef · ● 205.1K ·

@pelikhan pelikhan merged commit c01d47c into main May 20, 2026
@pelikhan pelikhan deleted the copilot/improve-agent-log-rendering branch May 20, 2026 23:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants