Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 16 additions & 0 deletions .changeset/grumpy-bottles-sneeze.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
---
'@colony/storage': patch
'colonyq': patch
---

Fix `aggregateMcpMetrics` error_reasons grouping so per-row counts sum to
`error_count`. The grouping previously partitioned by `(operation, error_code,
error_message)`, but several handlers embed unique session IDs in their error
messages (e.g. `sub-task is claimed by codex-session-XYZ`), so each race loss
produced a distinct group. Combined with a 3-row truncation per operation, the
result was that nearly all errors were hidden — `task_plan_claim_subtask` would
report 7 errors in the Top error reasons table while the Operations table showed
93 for the same row. Grouping now drops `error_message` from the key (SQLite
picks the row with the latest `ts` for the sample message via its bare-column-
with-MAX optimization) and the per-operation cap is bumped from 3 to 8 since
codes are low-cardinality. Sum-of-reasons now matches error_count exactly.
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
schema: spec-driven
created: 2026-05-14
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
# agent-claude-group-mcp-metric-error-reasons-by-code-o-2026-05-14-14-14 (minimal / T1)

Branch: `agent/claude/group-mcp-metric-error-reasons-by-code-o-2026-05-14-14-14`

Drop `error_message` from the error_reasons grouping key in `aggregateMcpMetrics`
so per-row counts sum to `error_count`. Handlers like `task_plan_claim_subtask`
embed unique session IDs in their messages ("sub-task is claimed by codex-X"),
which fragmented identical errors into many rows and pushed most out of the
3-row truncation. SQLite returns the latest message via the bare-column-with-
MAX optimization, keeping a diagnostic sample without bloating the key. Per-op
cap bumped from 3 → 8 since codes are low-cardinality. Verified on live DB:
`task_plan_claim_subtask` previously reported 7 of 93 errors in `Top error
reasons`; now reports 60 + 33 + 1 = 94 across three codes.

## Handoff

- Handoff: change=`agent-claude-group-mcp-metric-error-reasons-by-code-o-2026-05-14-14-14`; branch=`agent/<your-name>/<branch-slug>`; scope=`TODO`; action=`continue this sandbox or finish cleanup after a usage-limit/manual takeover`.
- Copy prompt: Continue `agent-claude-group-mcp-metric-error-reasons-by-code-o-2026-05-14-14-14` on branch `agent/<your-name>/<branch-slug>`. Work inside the existing sandbox, review `openspec/changes/agent-claude-group-mcp-metric-error-reasons-by-code-o-2026-05-14-14-14/notes.md`, continue from the current state instead of creating a new sandbox, and when the work is done run `gx branch finish --branch agent/<your-name>/<branch-slug> --base dev --via-pr --wait-for-merge --cleanup`.

## Cleanup

- [ ] Run: `gx branch finish --branch agent/<your-name>/<branch-slug> --base dev --via-pr --wait-for-merge --cleanup`
- [ ] Record PR URL + `MERGED` state in the completion handoff.
- [ ] Confirm sandbox worktree is gone (`git worktree list`, `git branch -a`).
8 changes: 4 additions & 4 deletions packages/storage/src/storage.ts
Original file line number Diff line number Diff line change
Expand Up @@ -1420,7 +1420,7 @@ export class Storage {
FROM mcp_metrics
${where}
AND ok = 0
GROUP BY operation, error_code, error_message
GROUP BY operation, error_code
ORDER BY operation ASC, count DESC, last_ts DESC`,
)
.all(...args) as McpMetricsErrorReasonRawRow[];
Expand All @@ -1429,7 +1429,7 @@ export class Storage {
const operation = row.operation;
if (!operation) continue;
const reasons = byOperation.get(operation) ?? [];
if (reasons.length >= 3) continue;
if (reasons.length >= 8) continue;
reasons.push(normalizeMcpErrorReason(row));
byOperation.set(operation, reasons);
}
Expand All @@ -1449,9 +1449,9 @@ export class Storage {
FROM mcp_metrics
${where}
AND ok = 0
GROUP BY error_code, error_message
GROUP BY error_code
ORDER BY count DESC, last_ts DESC
LIMIT 3`,
LIMIT 8`,
)
.all(...args) as McpMetricsErrorReasonRawRow[];
return rows.map((row) => normalizeMcpErrorReason(row));
Expand Down
40 changes: 40 additions & 0 deletions packages/storage/test/mcp-metrics.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -190,6 +190,46 @@ describe('mcp_metrics storage', () => {
expect(empty.operations).toEqual([]);
});

it('groups error_reasons by code so per-error-row counts sum to error_count', () => {
for (let i = 0; i < 25; i += 1) {
record(storage, {
ts: 1_000 + i,
operation: 'task_plan_claim_subtask',
ok: false,
error_code: 'PLAN_SUBTASK_NOT_AVAILABLE',
error_message: `sub-task is claimed by codex-session-${i}`,
});
}
record(storage, {
ts: 1_100,
operation: 'task_plan_claim_subtask',
ok: false,
error_code: 'PLAN_SUBTASK_NOT_FOUND',
error_message: 'no sub-task at spec/x/sub-0',
});
record(storage, {
ts: 1_200,
operation: 'task_plan_claim_subtask',
ok: false,
error_code: 'PLAN_SUBTASK_NOT_FOUND',
error_message: 'no sub-task at spec/y/sub-1',
});

const agg = storage.aggregateMcpMetrics({ since: 0 });
const claim = agg.operations.find((row) => row.operation === 'task_plan_claim_subtask');
if (!claim) throw new Error('expected task_plan_claim_subtask row');
expect(claim.error_count).toBe(27);
const sum = claim.error_reasons.reduce((acc, r) => acc + r.count, 0);
expect(sum).toBe(27);
const codes = claim.error_reasons.map((r) => r.error_code).sort();
expect(codes).toEqual(['PLAN_SUBTASK_NOT_AVAILABLE', 'PLAN_SUBTASK_NOT_FOUND']);
const notAvailable = claim.error_reasons.find(
(r) => r.error_code === 'PLAN_SUBTASK_NOT_AVAILABLE',
);
expect(notAvailable?.count).toBe(25);
expect(notAvailable?.error_message).toContain('sub-task is claimed by codex-session-');
});

it('countMcpMetricsSince counts rows in the window', () => {
record(storage, { ts: 100, operation: 'search' });
record(storage, { ts: 500, operation: 'search' });
Expand Down
Loading