Capture tool calls in live OpenAI Responses projector by bgmcmullen · Pull Request #83 · hyparam/hypaware

bgmcmullen · 2026-06-04T17:18:45Z

We previously were not logging Codex tool calls, they would appear if backfilled but would not be captured live.

This PR adds the functionality to exchange-projector.js to capture the tool calls.

philcunliffe · 2026-06-04T17:34:07Z

Dual-agent review — `request_changes`

Verdict: request_changes
Risk class: medium
Auto-merge advisory: 👎 thumbs down — verdict is request_changes; needs human-gated follow-up

Advisory only: no merge was attempted.

Reviewer reconciliation: The request_changes verdict is driven by one Codex major (Behavioral Correctness) plus medium blast radius. Independent verification found Codex's finding to be a false positive under the current recorder invariant (is_sse ⟹ response_body=null, recorder.js:190,260), which makes the body+stream mixed state it requires unreachable. It remains valid as a latent-fragility flag. The remaining Claude findings are edge-case test-coverage gaps, not defects. Practical recommendation: safe to land; optionally add the four edge-case tests and a guard/comment documenting the recorder invariant.

Risk capstone

Cross-reference: reviewer findings vs high-risk surfaces

Source	Finding (severity, evidence)	Intersects
Codex	Behavioral Correctness — stream-only tool calls dropped when body has text, major, exchange-projector.js:263-267,386	Targets (`openAiResponsesMessages`) + Risks bullet 1. Verified false positive: recorder invariant (`isSse ⟹ response_body=null`, `recorder.js:190,260`) makes the mixed body+stream state unreachable. Valid as a latent-fragility flag, not a present bug.
Claude	Tool-call drop paths untested, minor, exchange-projector.js:771,781	Risks bullet 2
Claude	`toolOutputText` object branches untested, minor, exchange-projector.js:808	Risks bullet 2
Claude	Malformed-JSON args untested, minor, exchange-projector.js:791	Risks bullet 2
Claude	Stream call-id dedup untested, minor, exchange-projector.js:427	Risks bullet 2

Codex review

Fix Validations

Live OpenAI Responses tool calls were missing

Status: incomplete
Evidence: hypaware-core/plugins-workspace/codex/src/exchange-projector.js:266, hypaware-core/plugins-workspace/codex/src/exchange-projector.js:267, hypaware-core/plugins-workspace/codex/src/exchange-projector.js:386
Assessment: The new stream parser can capture response.output_item.done tool calls, but the top-level dispatcher only uses it when the parsed response_body yields zero assistant messages. Any capture that has both a final body and SSE events can still drop stream-only tool calls.

Findings

1) Behavioral Correctness

Severity: major
Confidence: high
Evidence: hypaware-core/plugins-workspace/codex/src/exchange-projector.js:263, hypaware-core/plugins-workspace/codex/src/exchange-projector.js:266, hypaware-core/plugins-workspace/codex/src/exchange-projector.js:267, hypaware-core/plugins-workspace/codex/src/exchange-projector.js:386
Why it matters: A live SSE exchange with response_body containing text plus stream_events containing response.output_item.done will project the text from the body and skip the stream parser, so the tool call is still not logged.
Suggested fix: When streamEvents.length > 0, run the stream projection and merge/dedupe it with body-projected messages, or pass the parsed top-level body into the stream merge path so stream-only tool calls are appended even when body text exists.

No Finding

Contract & Interface Fidelity
Change Impact / Blast Radius
Concurrency, Ordering & State Safety
Error Handling & Resilience
Security Surface
Resource Lifecycle & Cleanup
Release Safety
Test Evidence Quality
Architectural Consistency
Debuggability & Operability

Evidence Bundle

Changed hot paths: openAiResponsesMessages; responsesInputMessages; responsesAssistantMessagesFromBody; responsesAssistantMessagesFromStream; toolUseBlockFromPayload; toolResultBlockFromPayload
Impacted callers: hypaware-core/plugins-workspace/codex/src/exchange-projector.js:184
Impacted tests: test/plugins/codex-exchange-projector.test.js:131, test/plugins/codex-exchange-projector.test.js:197, test/plugins/codex-exchange-projector.test.js:275, test/plugins/codex-exchange-projector.test.js:311, test/plugins/codex-exchange-projector.test.js:351
Unresolved uncertainty: I did not verify the gateway capture invariant for SSE response_body; the new tests only cover SSE with response_body: '', so the mixed body-plus-stream case remains unproven.

Claude review

Claude review

Five parallel review subagents (guidance compliance, shallow bug scan, historical
context, contract & callers, comments & tests) examined the PR at head b482bfc.
Four of the five found no defects: the change is style-compliant (no semicolons,
JSDoc-only types, @import at top, no @typedef), introduces no logic bugs,
re-introduces no historically-fixed bug, keeps the only export
(createCodexExchangeProjector) and its private-helper renames internally
consistent, and the new tool_use/tool_result block shapes are already
supported by the downstream gateway part-expander (message_projector.js:534-542,
columns tool_args/tool_result_for exist) and the kernel types
(AiGatewayProjectedMessage.content: string | JsonObject[]). The live helpers
mirror codex/src/backfill.js byte-for-byte, so turn-1 response rows hash-equal
turn-2 input-replay rows. Full suite: 759 tests pass.

The surviving findings are all test-coverage gaps on branches this PR introduced.
The implementation in each case was independently verified correct; these raise
regression-detection risk rather than describing a present defect, hence minor.

Tool-call drop paths (`undefined` returns) are untested

Severity: minor
Confidence: 90
Evidence: hypaware-core/plugins-workspace/codex/src/exchange-projector.js:771, :781
Why it matters: A function_call missing name/call_id (or function_call_output missing call_id) is silently skipped via if (block) …, and the load-bearing call_id ?? id fallback is never exercised — every test supplies call_id, so a regression that started dropping valid calls would pass.
Suggested fix: Add cases for (a) a function_call with only id (no call_id) → still captured keyed on id; (b) a function_call missing name → dropped, not emitted broken.

`toolOutputText` object/wrapper/stringify branches are untested

Severity: minor
Confidence: 88
Evidence: hypaware-core/plugins-workspace/codex/src/exchange-projector.js:808
Why it matters: Every function_call_output test uses a plain string output; the structured-output path ({ output | content | text } unwrap and JSON.stringify fallback) — the exact case the JSDoc says it handles, and a common real Codex shape — has zero coverage.
Suggested fix: Add a function_call_output with output: { output: 'x' } (expect content: 'x') and one with output: { foo: 1 } (expect content: '{"foo":1}').

Malformed-JSON tool arguments are untested

Severity: minor
Confidence: 85
Evidence: hypaware-core/plugins-workspace/codex/src/exchange-projector.js:791
Why it matters: The raw-string fallback in normalizeToolInput (parsed === value ? value) is what protects against truncated/garbled streamed args, but no test feeds a JSON-looking-but-invalid arguments string (e.g. '{"cmd":').
Suggested fix: One case with arguments: '{"cmd":' → expect input === '{"cmd":' (preserved verbatim, not dropped, no throw).

Stream call-id dedup branch is untested

Severity: minor
Confidence: 80
Evidence: hypaware-core/plugins-workspace/codex/src/exchange-projector.js:427
Why it matters: The new logic avoids double-emitting a tool call present in both response.output_item.done and the response.completed body, but the "prefers completed body" test has no output_item.done, so toolUsesByCallId is empty and the dedup loop never runs — a regression that emitted the tool twice would pass.
Suggested fix: One SSE case where output_item.done carries call_id: 'c1' AND the completed body's output[] also contains function_call call_id: 'c1' → assert the tool appears exactly once.

Reports: /Users/phil/workspace/hypaware/.git/worktrees/dual-review-pr83/dual-review/pr-83

philcunliffe

Not concerned with codex's finding

bgmcmullen added 2 commits June 3, 2026 22:21

track open ai response format tool calls

d30c8df

fix review bugs

b482bfc

bgmcmullen requested a review from philcunliffe June 4, 2026 17:20

philcunliffe approved these changes Jun 4, 2026

View reviewed changes

bgmcmullen merged commit 027e77c into master Jun 4, 2026
6 checks passed

bgmcmullen deleted the log-codex-tool-calls branch June 4, 2026 17:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Capture tool calls in live OpenAI Responses projector#83

Capture tool calls in live OpenAI Responses projector#83
bgmcmullen merged 2 commits into
masterfrom
log-codex-tool-calls

bgmcmullen commented Jun 4, 2026

Uh oh!

philcunliffe commented Jun 4, 2026

Fix Validations

Live OpenAI Responses tool calls were missing

Findings

1) Behavioral Correctness

No Finding

Evidence Bundle

Claude review

Tool-call drop paths (`undefined` returns) are untested

`toolOutputText` object/wrapper/stringify branches are untested

Malformed-JSON tool arguments are untested

Stream call-id dedup branch is untested

Uh oh!

philcunliffe left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

bgmcmullen commented Jun 4, 2026

Uh oh!

philcunliffe commented Jun 4, 2026

Dual-agent review — request_changes

Risk capstone

Cross-reference: reviewer findings vs high-risk surfaces

Fix Validations

Live OpenAI Responses tool calls were missing

Findings

1) Behavioral Correctness

No Finding

Evidence Bundle

Claude review

Tool-call drop paths (undefined returns) are untested

toolOutputText object/wrapper/stringify branches are untested

Malformed-JSON tool arguments are untested

Stream call-id dedup branch is untested

Uh oh!

philcunliffe left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Dual-agent review — `request_changes`

Tool-call drop paths (`undefined` returns) are untested

`toolOutputText` object/wrapper/stringify branches are untested