Capture tool calls in live OpenAI Responses projector#83
Conversation
Dual-agent review —
|
| Source | Finding (severity, evidence) | Intersects |
|---|---|---|
| Codex | Behavioral Correctness — stream-only tool calls dropped when body has text, major, exchange-projector.js:263-267,386 | Targets (openAiResponsesMessages) + Risks bullet 1. Verified false positive: recorder invariant (isSse ⟹ response_body=null, recorder.js:190,260) makes the mixed body+stream state unreachable. Valid as a latent-fragility flag, not a present bug. |
| Claude | Tool-call drop paths untested, minor, exchange-projector.js:771,781 | Risks bullet 2 |
| Claude | toolOutputText object branches untested, minor, exchange-projector.js:808 |
Risks bullet 2 |
| Claude | Malformed-JSON args untested, minor, exchange-projector.js:791 | Risks bullet 2 |
| Claude | Stream call-id dedup untested, minor, exchange-projector.js:427 | Risks bullet 2 |
Codex review
Fix Validations
Live OpenAI Responses tool calls were missing
- Status: incomplete
- Evidence: hypaware-core/plugins-workspace/codex/src/exchange-projector.js:266, hypaware-core/plugins-workspace/codex/src/exchange-projector.js:267, hypaware-core/plugins-workspace/codex/src/exchange-projector.js:386
- Assessment: The new stream parser can capture
response.output_item.donetool calls, but the top-level dispatcher only uses it when the parsedresponse_bodyyields zero assistant messages. Any capture that has both a final body and SSE events can still drop stream-only tool calls.
Findings
1) Behavioral Correctness
- Severity: major
- Confidence: high
- Evidence: hypaware-core/plugins-workspace/codex/src/exchange-projector.js:263, hypaware-core/plugins-workspace/codex/src/exchange-projector.js:266, hypaware-core/plugins-workspace/codex/src/exchange-projector.js:267, hypaware-core/plugins-workspace/codex/src/exchange-projector.js:386
- Why it matters: A live SSE exchange with
response_bodycontaining text plusstream_eventscontainingresponse.output_item.donewill project the text from the body and skip the stream parser, so the tool call is still not logged. - Suggested fix: When
streamEvents.length > 0, run the stream projection and merge/dedupe it with body-projected messages, or pass the parsed top-level body into the stream merge path so stream-only tool calls are appended even when body text exists.
No Finding
- Contract & Interface Fidelity
- Change Impact / Blast Radius
- Concurrency, Ordering & State Safety
- Error Handling & Resilience
- Security Surface
- Resource Lifecycle & Cleanup
- Release Safety
- Test Evidence Quality
- Architectural Consistency
- Debuggability & Operability
Evidence Bundle
- Changed hot paths:
openAiResponsesMessages;responsesInputMessages;responsesAssistantMessagesFromBody;responsesAssistantMessagesFromStream;toolUseBlockFromPayload;toolResultBlockFromPayload - Impacted callers: hypaware-core/plugins-workspace/codex/src/exchange-projector.js:184
- Impacted tests: test/plugins/codex-exchange-projector.test.js:131, test/plugins/codex-exchange-projector.test.js:197, test/plugins/codex-exchange-projector.test.js:275, test/plugins/codex-exchange-projector.test.js:311, test/plugins/codex-exchange-projector.test.js:351
- Unresolved uncertainty: I did not verify the gateway capture invariant for SSE
response_body; the new tests only cover SSE withresponse_body: '', so the mixed body-plus-stream case remains unproven.
Claude review
Claude review
Five parallel review subagents (guidance compliance, shallow bug scan, historical
context, contract & callers, comments & tests) examined the PR at head b482bfc.
Four of the five found no defects: the change is style-compliant (no semicolons,
JSDoc-only types, @import at top, no @typedef), introduces no logic bugs,
re-introduces no historically-fixed bug, keeps the only export
(createCodexExchangeProjector) and its private-helper renames internally
consistent, and the new tool_use/tool_result block shapes are already
supported by the downstream gateway part-expander (message_projector.js:534-542,
columns tool_args/tool_result_for exist) and the kernel types
(AiGatewayProjectedMessage.content: string | JsonObject[]). The live helpers
mirror codex/src/backfill.js byte-for-byte, so turn-1 response rows hash-equal
turn-2 input-replay rows. Full suite: 759 tests pass.
The surviving findings are all test-coverage gaps on branches this PR introduced.
The implementation in each case was independently verified correct; these raise
regression-detection risk rather than describing a present defect, hence minor.
Tool-call drop paths (undefined returns) are untested
- Severity: minor
- Confidence: 90
- Evidence: hypaware-core/plugins-workspace/codex/src/exchange-projector.js:771, :781
- Why it matters: A
function_callmissingname/call_id(orfunction_call_outputmissingcall_id) is silently skipped viaif (block) …, and the load-bearingcall_id ?? idfallback is never exercised — every test suppliescall_id, so a regression that started dropping valid calls would pass. - Suggested fix: Add cases for (a) a
function_callwith onlyid(nocall_id) → still captured keyed onid; (b) afunction_callmissingname→ dropped, not emitted broken.
toolOutputText object/wrapper/stringify branches are untested
- Severity: minor
- Confidence: 88
- Evidence: hypaware-core/plugins-workspace/codex/src/exchange-projector.js:808
- Why it matters: Every
function_call_outputtest uses a plain stringoutput; the structured-output path ({ output | content | text }unwrap andJSON.stringifyfallback) — the exact case the JSDoc says it handles, and a common real Codex shape — has zero coverage. - Suggested fix: Add a
function_call_outputwithoutput: { output: 'x' }(expectcontent: 'x') and one withoutput: { foo: 1 }(expectcontent: '{"foo":1}').
Malformed-JSON tool arguments are untested
- Severity: minor
- Confidence: 85
- Evidence: hypaware-core/plugins-workspace/codex/src/exchange-projector.js:791
- Why it matters: The raw-string fallback in
normalizeToolInput(parsed === value ? value) is what protects against truncated/garbled streamed args, but no test feeds a JSON-looking-but-invalidargumentsstring (e.g.'{"cmd":'). - Suggested fix: One case with
arguments: '{"cmd":'→ expectinput === '{"cmd":'(preserved verbatim, not dropped, no throw).
Stream call-id dedup branch is untested
- Severity: minor
- Confidence: 80
- Evidence: hypaware-core/plugins-workspace/codex/src/exchange-projector.js:427
- Why it matters: The new logic avoids double-emitting a tool call present in both
response.output_item.doneand theresponse.completedbody, but the "prefers completed body" test has nooutput_item.done, sotoolUsesByCallIdis empty and the dedup loop never runs — a regression that emitted the tool twice would pass. - Suggested fix: One SSE case where
output_item.donecarriescall_id: 'c1'AND the completed body'soutput[]also containsfunction_call call_id: 'c1'→ assert the tool appears exactly once.
Reports: /Users/phil/workspace/hypaware/.git/worktrees/dual-review-pr83/dual-review/pr-83
philcunliffe
left a comment
There was a problem hiding this comment.
Not concerned with codex's finding
We previously were not logging Codex tool calls, they would appear if backfilled but would not be captured live.
This PR adds the functionality to exchange-projector.js to capture the tool calls.