feat: new Scenario API ✨ #2
Merged
Merged
Conversation
…ew tests, getting things started
…g and returning messages as they wish, simplify the executor code, add stronger and more robust validations and conversions
… pending agents, treat all agents the same, just keep their roles proceeding, get ready for scripted runs
…ned, as users will be able to manually evaluate it later
…the testing agent
There was a problem hiding this comment.
Pull Request Overview
This PR introduces a new Scenario API with improvements in agent adapter configuration, scenario scripting, and enhanced type safety across several modules. Key changes include:
- Conversion of agent functions to subclass-based adapters for both testing and production.
- Refactoring of configuration merging and scenario scripting for better consistency.
- Updates to tests and examples to support the new workflow and API signatures.
Reviewed Changes
Copilot reviewed 26 out of 26 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| tests/* | Updated tests to use new adapter subclass pattern |
| scenario/* | Refactored Scenario initialization, configuration, and scripting |
| examples/* | Updated examples to align with new Scenario API design |
| setup.py, pyproject.toml, Makefile | Version updates and dependency additions |
Comments suppressed due to low confidence (2)
scenario/config.py:32
- [nitpick] Instead of creating a custom items() helper to merge configuration values, consider using Pydantic’s model_dump(exclude_none=True) directly to improve clarity and consistency.
def merge(self, other: "ScenarioConfig") -> "ScenarioConfig":
scenario/scenario.py:100
- Filtering None values out of the agents list before type and subclass checks would prevent potential unclear ValueErrors; consider using a list comprehension to include only non-null agents.
agents = agents or [kwargs.get("testing_agent"), agent, # type: ignore]
| def __init__(self, input: AgentInput): | ||
| super().__init__(input) | ||
|
|
||
| if not self.model: |
There was a problem hiding this comment.
[nitpick] The check for an empty model value may be ambiguous if self.model is an empty string. Consider a more explicit validation or providing a sensible default in TestingAgent.with_config.
Suggested change
| if not self.model: | |
| if not isinstance(self.model, str) or not self.model.strip(): |
0xdeafcafe
approved these changes
Jun 12, 2025
…h don't play with well json, also, make sure return types can always be converted into a dict
drewdrewthis
added a commit
that referenced
this pull request
Apr 16, 2026
…ging Concerns resolved from the second review pass: - #1 Drain a pending wait=False agent turn at the top of _script_call_agent plus proceed/succeed/fail so judge(), succeed(), fail(), proceed() see the completed agent message. Guard against self-await when the drain enters on the background task itself. - #2 voice_style no longer injects "[style] text" inline — every registered provider would have spoken the bracketed word aloud. Emit a one-shot UserWarning and synthesise without modification until per-provider instructions channels land. - #5 Replace blanket "except Exception: pass" in hook fire helpers with logger.warning(..., exc_info=True) so callback bugs are visible. - #6 Bound TTS cache to 64 LRU entries — ~14 MB per 5-min clip worst case caps the cache at ~900 MB even for long utterances. Prevents unbounded growth in long-lived processes. - #7 background_noise path fallback now requires a separator or .wav suffix before treating the argument as a filesystem path, avoiding the cwd footgun where a typo'd preset name matches a stray local file. - #9 Replace module-global _WARNED_ADAPTERS with WebRTCVadFallback.reset_warnings() classmethod so tests don't need to reach into private module state. Update tests accordingly. - #10 Rewrite PendingTransportError hint: remind subclass authors that the inherited AdapterCapabilities ClassVar must be re-audited, so a subclass claiming streaming_transcripts=True without a real transcript stream does not silently break after_words interruption. - #11 Defensively trim a trailing odd byte from OpenAI TTS PCM responses and pin the PCM16 @ 24kHz mono expectation in the docstring. Invariant asserted at AudioChunk boundary (see #14). - #13 OpenAI Realtime user-role text routing: when the user-role agent is an OpenAIRealtimeAgent, scripted user("text") now invokes send_text on the realtime session instead of TTS. Explicit AC from §7.2 L1164-1171. - #14 AudioChunk.__post_init__ raises ValueError on odd-byte PCM16 data, catching partial-frame bugs at the canonical boundary instead of letting them silently drift through np.frombuffer / duration_seconds. Deferred to follow-ups (noted in PR body, not blocking #350): - #3 stub adapters transport wire-up - #4 narrow public surface for executor/sim state - #8 rename noise presets to match synthetic content - #12 pytest-bdd wiring for the 83 Gherkin scenarios Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
drewdrewthis
added a commit
that referenced
this pull request
Apr 20, 2026
…ging Concerns resolved from the second review pass: - #1 Drain a pending wait=False agent turn at the top of _script_call_agent plus proceed/succeed/fail so judge(), succeed(), fail(), proceed() see the completed agent message. Guard against self-await when the drain enters on the background task itself. - #2 voice_style no longer injects "[style] text" inline — every registered provider would have spoken the bracketed word aloud. Emit a one-shot UserWarning and synthesise without modification until per-provider instructions channels land. - #5 Replace blanket "except Exception: pass" in hook fire helpers with logger.warning(..., exc_info=True) so callback bugs are visible. - #6 Bound TTS cache to 64 LRU entries — ~14 MB per 5-min clip worst case caps the cache at ~900 MB even for long utterances. Prevents unbounded growth in long-lived processes. - #7 background_noise path fallback now requires a separator or .wav suffix before treating the argument as a filesystem path, avoiding the cwd footgun where a typo'd preset name matches a stray local file. - #9 Replace module-global _WARNED_ADAPTERS with WebRTCVadFallback.reset_warnings() classmethod so tests don't need to reach into private module state. Update tests accordingly. - #10 Rewrite PendingTransportError hint: remind subclass authors that the inherited AdapterCapabilities ClassVar must be re-audited, so a subclass claiming streaming_transcripts=True without a real transcript stream does not silently break after_words interruption. - #11 Defensively trim a trailing odd byte from OpenAI TTS PCM responses and pin the PCM16 @ 24kHz mono expectation in the docstring. Invariant asserted at AudioChunk boundary (see #14). - #13 OpenAI Realtime user-role text routing: when the user-role agent is an OpenAIRealtimeAgent, scripted user("text") now invokes send_text on the realtime session instead of TTS. Explicit AC from §7.2 L1164-1171. - #14 AudioChunk.__post_init__ raises ValueError on odd-byte PCM16 data, catching partial-frame bugs at the canonical boundary instead of letting them silently drift through np.frombuffer / duration_seconds. Deferred to follow-ups (noted in PR body, not blocking #350): - #3 stub adapters transport wire-up - #4 narrow public surface for executor/sim state - #8 rename noise presets to match synthetic content - #12 pytest-bdd wiring for the 83 Gherkin scenarios Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
drewdrewthis
added a commit
that referenced
this pull request
Apr 22, 2026
…tion Closes gaps surfaced during AC-7/AC-8 happy-path doc writing: - README gains a "Voice Agents" section with a minimal ElevenLabs example, adapter inventory, feature surface summary, and pointers to the two happy-path docs. Fills gap #1 (OpenAI key also needed for ElevenLabs tests) and gap #11 (CI example missing). - scripts/provision_elevenlabs_agent.py header clarifies the script is for the SDK's own CI, NOT for SDK users — closes gap #3 (user confusion about when to run it). Remaining gaps (#2, #4, #5, #7, #8, #9, #10) were already covered by the happy-path docs themselves. Gap #6 (verify scripted user("text") against live ElevenLabs) is verified by the AC-6 suite run. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
drewdrewthis
added a commit
that referenced
this pull request
May 11, 2026
…ging Concerns resolved from the second review pass: - #1 Drain a pending wait=False agent turn at the top of _script_call_agent plus proceed/succeed/fail so judge(), succeed(), fail(), proceed() see the completed agent message. Guard against self-await when the drain enters on the background task itself. - #2 voice_style no longer injects "[style] text" inline — every registered provider would have spoken the bracketed word aloud. Emit a one-shot UserWarning and synthesise without modification until per-provider instructions channels land. - #5 Replace blanket "except Exception: pass" in hook fire helpers with logger.warning(..., exc_info=True) so callback bugs are visible. - #6 Bound TTS cache to 64 LRU entries — ~14 MB per 5-min clip worst case caps the cache at ~900 MB even for long utterances. Prevents unbounded growth in long-lived processes. - #7 background_noise path fallback now requires a separator or .wav suffix before treating the argument as a filesystem path, avoiding the cwd footgun where a typo'd preset name matches a stray local file. - #9 Replace module-global _WARNED_ADAPTERS with WebRTCVadFallback.reset_warnings() classmethod so tests don't need to reach into private module state. Update tests accordingly. - #10 Rewrite PendingTransportError hint: remind subclass authors that the inherited AdapterCapabilities ClassVar must be re-audited, so a subclass claiming streaming_transcripts=True without a real transcript stream does not silently break after_words interruption. - #11 Defensively trim a trailing odd byte from OpenAI TTS PCM responses and pin the PCM16 @ 24kHz mono expectation in the docstring. Invariant asserted at AudioChunk boundary (see #14). - #13 OpenAI Realtime user-role text routing: when the user-role agent is an OpenAIRealtimeAgent, scripted user("text") now invokes send_text on the realtime session instead of TTS. Explicit AC from §7.2 L1164-1171. - #14 AudioChunk.__post_init__ raises ValueError on odd-byte PCM16 data, catching partial-frame bugs at the canonical boundary instead of letting them silently drift through np.frombuffer / duration_seconds. Deferred to follow-ups (noted in PR body, not blocking #350): - #3 stub adapters transport wire-up - #4 narrow public surface for executor/sim state - #8 rename noise presets to match synthetic content - #12 pytest-bdd wiring for the 83 Gherkin scenarios Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
7 tasks
drewdrewthis
added a commit
that referenced
this pull request
May 22, 2026
…g-safe compare, coverage Addresses 8 of the 13 actionable items from the /review fanout: Security: - twilio-server.ts: cap webhook body at 1 MB via streaming guard; reject with HTTP 413 instead of accumulating into memory (concern #7). - twilio-shared.ts: replace hand-rolled XOR signature compare with `crypto.timingSafeEqual` on decoded base64 buffers — Node-stdlib primitive, no DIY constant-time math (concern #10). - twilio-tunnel.ts: drop `(0, eval)("(name) => import(name)")` indirect; use bare dynamic `import()` in try/catch on ERR_MODULE_NOT_FOUND so bundlers and security scanners can analyze the path (concern #8). Coverage (the highest-risk port-only LOC was untested): - twilio.test.ts: codec round-trip — 100 ms 440 Hz sine wave through pcm16/24k → mulaw/8k → pcm16/24k, average abs sample-diff < 2000 (under 10 % of peak). Plus empty-input case. - twilio.test.ts: `verifyTwilioSignature` valid-signature accept, wrong-token reject, wrong-URL reject, missing-signature reject. - twilio.test.ts: `validateE164` + `validateDtmf` accept/reject + the TwiML-injection payload the docstring warns about. - twilio.test.ts: `onDtmf` callback fires on `dtmf` frame, `allowedCallers` filter rejects + records, stop-frame flush enqueues a final AudioChunk. Observability + boy-scout: - twilio-logger.ts (new): minimal `[twilio] …` console wrapper mirroring Python's `logging.getLogger("scenario.voice.twilio")`. Same log sites as the Python parity — body-cap violation, signature rejection, disallowed-caller reject, DTMF receipt, onDtmf callback error (concerns #1 + #14). - twilio-shared.ts: drop duplicate `PCM16_SAMPLE_WIDTH = 2`; import the canonical `PCM16_SAMPLE_WIDTH_BYTES` from `../audio-chunk` and rename call sites (concern #3). - twilio.ts: drop dead `UnsupportedCapabilityError` import + the `export type` re-export that papered over its unused state — base class re-exports via voice/index.ts already (concern #12). - twilio-tunnel.test.ts: wrap cucumber binding in `if (TUNNEL_ENABLED)`; on CI fall back to `describe.skip(...)` with a single placeholder `it` so the runner reports one skipped block instead of five vacuous greens (concern #5). Deferred (documented as follow-ups, not addressed here): - Refactor adapter↔server coupling into a `MediaStreamSession` value object (concern #2). Bigger architectural change; PR3+ executor wiring will exercise the seam first. - Migrate `makeDeferred` to `Promise.withResolvers()` (concern #9). - Replace `rejectedCount` instance field with `getStats()` snapshot (concern #11) — depends on the logger module's contract solidifying. - `call()` Liskov tension (concern #13) — same PR3+ wiring scope. Test surface: 33 passed + 1 skipped (was 27); full suite 409 passed + 1 skipped, build + typecheck green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
8 tasks
drewdrewthis
added a commit
that referenced
this pull request
May 22, 2026
…cases Addresses 5 review concerns (review #540 synthesizer pass): - #1 perf: receive-side mulaw buffer now stores Uint8Array slices, not number[]; bufferMulaw is O(1) per call instead of O(n) per byte. - #2 docs: coerceFrameToText's 0x7b/0x5b heuristic is now documented as a known rare-collision risk (binary µ-law with first byte == { or [ would mis-route to JSON parser and silently drop). - #4 test pyramid: round-trip scenario re-tagged @Unit (FakeWebSocket = no network) — real-WSS @integration demo deferred behind env-gated bot endpoint per /browser-qa note. - #5 coverage: 2 new edge-case tests for partial-buffer flush on bot-sent `stop` event and on socket-close. Not addressed in this PR (filed as follow-up considerations): - #3 vestigial audioFormat/sampleRate fields (inherited from Python parity) - #6 DTMF/E.164 validation regex port (pre-requisite for PR11 Twilio) - #8 extract TwilioMediaStreamsTransport helper (PR11 prep) - #9 JSON-frame size cap (no regression vs main; same constraint as Python) - #10 FakeWebSocket vs node:events (cosmetic)
drewdrewthis
added a commit
that referenced
this pull request
May 27, 2026
The bundled Pipecat bot emits a canned greeting on the `connected` event. The old user-first script let that greeting collide with the user's opener, so the first barge-in cut off the GREETING (not a substantive reply) and the bot answered a stale topic — caught by listening to the recording. Open with scenario.agent() to capture the greeting as its own turn (same shape as angry_customer / basic_greeting / random_interruptions), then drive a 2FA walk-through interrupted mid-reply to pivot to a password reset (barge-in #1), then interrupted again to ask for brevity (barge-in #2). Each barge-in now cuts off a SUBSTANTIVE reply and the conversation coheres. Criteria updated to encode the new promise; CODE assertions (transcriptTruncated, fired_after_speech, ratio<0.8, recovery-after-interrupt) unchanged. Intentionally diverges from the user-first Python twin — documented in the file docstring. Regenerated recording (real bot, real OpenAI keys): 9 segments, greeting first, 2 truncated substantive replies, both fired_after_speech, byte-accurate manifest, full.wav 859758 bytes (<1MB). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
drewdrewthis
added a commit
that referenced
this pull request
May 31, 2026
Two /review must-fixes: 1. transcribe.test.ts had `void transcribeSegments(...).then(expect...)` inside a synchronous Then callback. The promise resolved after the step completed, so any assertion failure was silently swallowed by vitest. Made the Then async and awaited the call directly. 2. Doc-comment headers in stt/tts/transcribe.test.ts incorrectly cited `@ts-bound`. Updated to cite each file's actual tag (`@ts-stt`, `@ts-tts`, `@ts-transcribe`) so the next reader doesn't get misled. Note: transcribe.test.ts header already said `@ts-transcribe` correctly; only stt.test.ts and tts.test.ts needed updating. Reviewer convergence (3x on #1, 2x on #2): test + principles + hygiene + principles. Refs #516, #517, #513. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
drewdrewthis
added a commit
that referenced
this pull request
May 31, 2026
…rotocol tests Review pass on PR #536 surfaced four actionable concerns. Addressed: - **#1 (blocking) — `connect()` left WS without `error`/`close` handlers after `onOpen` called `removeAllListeners()`.** An unhandled `error` on a Node EventEmitter crashes the process. Re-attach `message` + `error` + `close` listeners atomically post-open. The new `error` handler nulls `this.ws` so subsequent `sendAudio`/`receiveAudio` fail fast instead of writing to a dead socket. Pending receivers drain to empty `AudioChunk` so the executor unwinds rather than hanging. - **#2 (blocking) — `onMessage` branches were untested.** Added 14 wire-protocol unit tests (plain vitest, not cucumber-bound) covering: base64 PCM16 decode, odd-byte trim invariant, audio queue/waiter FIFO, ping → pong with `event_id`, ping defensive (no `event_id` skip), `user_transcript` capture, `agent_response` capture, `agent_response_correction` override, format-drift warning, interruption + unknown event swallow, non-JSON frames ignored, post-open socket error drain, socket close drain, and `receiveAudio` timeout. - **#3 — Default LLM identifier was inlined in `eleven-labs-voice-agent.ts`, violating `voice-models.ts`'s self-declared single-source-of-truth contract.** Hoisted `COMPOSABLE_VOICE_LLM_MODEL` + `ELEVENLABS_DEFAULT_VOICE_ID` + `ELEVENLABS_TTS_MODEL` + `ELEVENLABS_STT_MODEL` into `voice-models.ts` (Python parity: `python/scenario/config/voice_models.py`). Adapters now import from there. - **#6 — `receiveAudio` referenced `waiter` from inside the timer body before its `const` declaration.** Worked by event-loop ordering; fragile to refactor. Forward-declared `let timer` and put `waiter` ahead of the `setTimeout` so the dependency graph is explicit. Tests: 411 / 22 files passing (previously 397 / 22; +14 wire-protocol tests). Build: tsup CJS + ESM + DTS clean. Deferred (intentional, tracked in PR body): - #4/#5: inline `pcm16ToWavBytes` + `synthesize` helpers — duplicate-by-design with PR2 (#513); merge-order constraint. - #7: `turnOutputEmitted` latch contract with PR3 executor — surface in PR3 review. - #8: distinguish natural end-of-turn from socket close — design-level, needs PR3 design conversation. - #9: `featurePath()` helper — extract once a 3rd test file would duplicate the climb. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
drewdrewthis
added a commit
that referenced
this pull request
May 31, 2026
…cases Addresses 5 review concerns (review #540 synthesizer pass): - #1 perf: receive-side mulaw buffer now stores Uint8Array slices, not number[]; bufferMulaw is O(1) per call instead of O(n) per byte. - #2 docs: coerceFrameToText's 0x7b/0x5b heuristic is now documented as a known rare-collision risk (binary µ-law with first byte == { or [ would mis-route to JSON parser and silently drop). - #4 test pyramid: round-trip scenario re-tagged @Unit (FakeWebSocket = no network) — real-WSS @integration demo deferred behind env-gated bot endpoint per /browser-qa note. - #5 coverage: 2 new edge-case tests for partial-buffer flush on bot-sent `stop` event and on socket-close. Not addressed in this PR (filed as follow-up considerations): - #3 vestigial audioFormat/sampleRate fields (inherited from Python parity) - #6 DTMF/E.164 validation regex port (pre-requisite for PR11 Twilio) - #8 extract TwilioMediaStreamsTransport helper (PR11 prep) - #9 JSON-frame size cap (no regression vs main; same constraint as Python) - #10 FakeWebSocket vs node:events (cosmetic)
drewdrewthis
added a commit
that referenced
this pull request
May 31, 2026
…g-safe compare, coverage Addresses 8 of the 13 actionable items from the /review fanout: Security: - twilio-server.ts: cap webhook body at 1 MB via streaming guard; reject with HTTP 413 instead of accumulating into memory (concern #7). - twilio-shared.ts: replace hand-rolled XOR signature compare with `crypto.timingSafeEqual` on decoded base64 buffers — Node-stdlib primitive, no DIY constant-time math (concern #10). - twilio-tunnel.ts: drop `(0, eval)("(name) => import(name)")` indirect; use bare dynamic `import()` in try/catch on ERR_MODULE_NOT_FOUND so bundlers and security scanners can analyze the path (concern #8). Coverage (the highest-risk port-only LOC was untested): - twilio.test.ts: codec round-trip — 100 ms 440 Hz sine wave through pcm16/24k → mulaw/8k → pcm16/24k, average abs sample-diff < 2000 (under 10 % of peak). Plus empty-input case. - twilio.test.ts: `verifyTwilioSignature` valid-signature accept, wrong-token reject, wrong-URL reject, missing-signature reject. - twilio.test.ts: `validateE164` + `validateDtmf` accept/reject + the TwiML-injection payload the docstring warns about. - twilio.test.ts: `onDtmf` callback fires on `dtmf` frame, `allowedCallers` filter rejects + records, stop-frame flush enqueues a final AudioChunk. Observability + boy-scout: - twilio-logger.ts (new): minimal `[twilio] …` console wrapper mirroring Python's `logging.getLogger("scenario.voice.twilio")`. Same log sites as the Python parity — body-cap violation, signature rejection, disallowed-caller reject, DTMF receipt, onDtmf callback error (concerns #1 + #14). - twilio-shared.ts: drop duplicate `PCM16_SAMPLE_WIDTH = 2`; import the canonical `PCM16_SAMPLE_WIDTH_BYTES` from `../audio-chunk` and rename call sites (concern #3). - twilio.ts: drop dead `UnsupportedCapabilityError` import + the `export type` re-export that papered over its unused state — base class re-exports via voice/index.ts already (concern #12). - twilio-tunnel.test.ts: wrap cucumber binding in `if (TUNNEL_ENABLED)`; on CI fall back to `describe.skip(...)` with a single placeholder `it` so the runner reports one skipped block instead of five vacuous greens (concern #5). Deferred (documented as follow-ups, not addressed here): - Refactor adapter↔server coupling into a `MediaStreamSession` value object (concern #2). Bigger architectural change; PR3+ executor wiring will exercise the seam first. - Migrate `makeDeferred` to `Promise.withResolvers()` (concern #9). - Replace `rejectedCount` instance field with `getStats()` snapshot (concern #11) — depends on the logger module's contract solidifying. - `call()` Liskov tension (concern #13) — same PR3+ wiring scope. Test surface: 33 passed + 1 skipped (was 27); full suite 409 passed + 1 skipped, build + typecheck green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
drewdrewthis
added a commit
that referenced
this pull request
May 31, 2026
…ig (keystone) New voice/config.ts (EDR §0.1 Tier 1 + ADR-002). The keystone of the per-run state model — replaces both the STT module-global (Gap #1) and configure({stt}) (Gap #2): - VoiceConfig { stt?: STTProvider | SttConfig; tts?: TtsConfig; defaultAudioFormat?; audioPlayback?; include{Audio,Timeline,Traces}? } - SttConfig { model; language?; apiKey? }, TtsConfig { voice; format?; apiKey? } - ResolvedVoiceConfig — stt always a concrete provider; the resolved per-run object - resolveVoiceConfig(optionLevel, scenarioLevel, defaults?): two-tier merge with the RunOptions.voice override in front of ScenarioConfig.voice, then pure defaults; `stt` resolves `options?.voice?.stt ?? cfg.voice?.stt ?? new OpenAISTTProvider()` (the default provider constructed per-run — pure default, not shared state). - DEFAULT_STT_MODEL, DEFAULT_AUDIO_FORMAT ("pcm16", the AI-SDK file part per §4.2). stt accepts an STTProvider instance (BYO) or an SttConfig descriptor (routed via resolveSttProvider). AudioFormat is a string union (nothing consumes a richer record yet; AudioChunk fixes 24kHz mono). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
drewdrewthis
added a commit
that referenced
this pull request
May 31, 2026
…gure() for global exec
Per EDR §0.1 + ADR-002 + PRD §4.7:
- config/configure.ts: removed the invented `configure({ stt })` provider knob
(present in no other PR, not in Python). `configure()` now carries only global
*execution* settings — `audioPlayback` (PRD §4.7: stream conversation audio to
local speakers). Stored in a module record read by the runner; getGlobalSettings()
exposes it. (audioPlayback is a genuine global UX toggle, not per-run provider
state — the ADR-001 concern is provider/model state flowing into call(), which
this is not.)
- configure.test.ts: rewritten to test the audioPlayback surface + a @ts-expect-error
asserting `stt` is no longer accepted.
- index.ts: updated the stale `configure({ stt })` comment; configure export stays.
Provider config is per-run via run({ voice: { stt, tts } }), not global.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
drewdrewthis
added a commit
that referenced
this pull request
May 31, 2026
… cascades Gaps #1/#2/#3/#7 + host wiring done; #4 verified intact. Final tsc/test state, remaining 29 SALVAGE markers, Tier B/C cascades (twilio-shared as critical-path blocker, composable de-dup now owed), and intentional EDR deviations recorded. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
drewdrewthis
added a commit
that referenced
this pull request
May 31, 2026
The bundled Pipecat bot emits a canned greeting on the `connected` event. The old user-first script let that greeting collide with the user's opener, so the first barge-in cut off the GREETING (not a substantive reply) and the bot answered a stale topic — caught by listening to the recording. Open with scenario.agent() to capture the greeting as its own turn (same shape as angry_customer / basic_greeting / random_interruptions), then drive a 2FA walk-through interrupted mid-reply to pivot to a password reset (barge-in #1), then interrupted again to ask for brevity (barge-in #2). Each barge-in now cuts off a SUBSTANTIVE reply and the conversation coheres. Criteria updated to encode the new promise; CODE assertions (transcriptTruncated, fired_after_speech, ratio<0.8, recovery-after-interrupt) unchanged. Intentionally diverges from the user-first Python twin — documented in the file docstring. Regenerated recording (real bot, real OpenAI keys): 9 segments, greeting first, 2 truncated substantive replies, both fired_after_speech, byte-accurate manifest, full.wav 859758 bytes (<1MB). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
rogeriochaves
pushed a commit
that referenced
this pull request
Jun 2, 2026
Two /review must-fixes: 1. transcribe.test.ts had `void transcribeSegments(...).then(expect...)` inside a synchronous Then callback. The promise resolved after the step completed, so any assertion failure was silently swallowed by vitest. Made the Then async and awaited the call directly. 2. Doc-comment headers in stt/tts/transcribe.test.ts incorrectly cited `@ts-bound`. Updated to cite each file's actual tag (`@ts-stt`, `@ts-tts`, `@ts-transcribe`) so the next reader doesn't get misled. Note: transcribe.test.ts header already said `@ts-transcribe` correctly; only stt.test.ts and tts.test.ts needed updating. Reviewer convergence (3x on #1, 2x on #2): test + principles + hygiene + principles. Refs #516, #517, #513. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
rogeriochaves
pushed a commit
that referenced
this pull request
Jun 2, 2026
…rotocol tests Review pass on PR #536 surfaced four actionable concerns. Addressed: - **#1 (blocking) — `connect()` left WS without `error`/`close` handlers after `onOpen` called `removeAllListeners()`.** An unhandled `error` on a Node EventEmitter crashes the process. Re-attach `message` + `error` + `close` listeners atomically post-open. The new `error` handler nulls `this.ws` so subsequent `sendAudio`/`receiveAudio` fail fast instead of writing to a dead socket. Pending receivers drain to empty `AudioChunk` so the executor unwinds rather than hanging. - **#2 (blocking) — `onMessage` branches were untested.** Added 14 wire-protocol unit tests (plain vitest, not cucumber-bound) covering: base64 PCM16 decode, odd-byte trim invariant, audio queue/waiter FIFO, ping → pong with `event_id`, ping defensive (no `event_id` skip), `user_transcript` capture, `agent_response` capture, `agent_response_correction` override, format-drift warning, interruption + unknown event swallow, non-JSON frames ignored, post-open socket error drain, socket close drain, and `receiveAudio` timeout. - **#3 — Default LLM identifier was inlined in `eleven-labs-voice-agent.ts`, violating `voice-models.ts`'s self-declared single-source-of-truth contract.** Hoisted `COMPOSABLE_VOICE_LLM_MODEL` + `ELEVENLABS_DEFAULT_VOICE_ID` + `ELEVENLABS_TTS_MODEL` + `ELEVENLABS_STT_MODEL` into `voice-models.ts` (Python parity: `python/scenario/config/voice_models.py`). Adapters now import from there. - **#6 — `receiveAudio` referenced `waiter` from inside the timer body before its `const` declaration.** Worked by event-loop ordering; fragile to refactor. Forward-declared `let timer` and put `waiter` ahead of the `setTimeout` so the dependency graph is explicit. Tests: 411 / 22 files passing (previously 397 / 22; +14 wire-protocol tests). Build: tsup CJS + ESM + DTS clean. Deferred (intentional, tracked in PR body): - #4/#5: inline `pcm16ToWavBytes` + `synthesize` helpers — duplicate-by-design with PR2 (#513); merge-order constraint. - #7: `turnOutputEmitted` latch contract with PR3 executor — surface in PR3 review. - #8: distinguish natural end-of-turn from socket close — design-level, needs PR3 design conversation. - #9: `featurePath()` helper — extract once a 3rd test file would duplicate the climb. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
rogeriochaves
pushed a commit
that referenced
this pull request
Jun 2, 2026
…cases Addresses 5 review concerns (review #540 synthesizer pass): - #1 perf: receive-side mulaw buffer now stores Uint8Array slices, not number[]; bufferMulaw is O(1) per call instead of O(n) per byte. - #2 docs: coerceFrameToText's 0x7b/0x5b heuristic is now documented as a known rare-collision risk (binary µ-law with first byte == { or [ would mis-route to JSON parser and silently drop). - #4 test pyramid: round-trip scenario re-tagged @Unit (FakeWebSocket = no network) — real-WSS @integration demo deferred behind env-gated bot endpoint per /browser-qa note. - #5 coverage: 2 new edge-case tests for partial-buffer flush on bot-sent `stop` event and on socket-close. Not addressed in this PR (filed as follow-up considerations): - #3 vestigial audioFormat/sampleRate fields (inherited from Python parity) - #6 DTMF/E.164 validation regex port (pre-requisite for PR11 Twilio) - #8 extract TwilioMediaStreamsTransport helper (PR11 prep) - #9 JSON-frame size cap (no regression vs main; same constraint as Python) - #10 FakeWebSocket vs node:events (cosmetic)
rogeriochaves
pushed a commit
that referenced
this pull request
Jun 2, 2026
…g-safe compare, coverage Addresses 8 of the 13 actionable items from the /review fanout: Security: - twilio-server.ts: cap webhook body at 1 MB via streaming guard; reject with HTTP 413 instead of accumulating into memory (concern #7). - twilio-shared.ts: replace hand-rolled XOR signature compare with `crypto.timingSafeEqual` on decoded base64 buffers — Node-stdlib primitive, no DIY constant-time math (concern #10). - twilio-tunnel.ts: drop `(0, eval)("(name) => import(name)")` indirect; use bare dynamic `import()` in try/catch on ERR_MODULE_NOT_FOUND so bundlers and security scanners can analyze the path (concern #8). Coverage (the highest-risk port-only LOC was untested): - twilio.test.ts: codec round-trip — 100 ms 440 Hz sine wave through pcm16/24k → mulaw/8k → pcm16/24k, average abs sample-diff < 2000 (under 10 % of peak). Plus empty-input case. - twilio.test.ts: `verifyTwilioSignature` valid-signature accept, wrong-token reject, wrong-URL reject, missing-signature reject. - twilio.test.ts: `validateE164` + `validateDtmf` accept/reject + the TwiML-injection payload the docstring warns about. - twilio.test.ts: `onDtmf` callback fires on `dtmf` frame, `allowedCallers` filter rejects + records, stop-frame flush enqueues a final AudioChunk. Observability + boy-scout: - twilio-logger.ts (new): minimal `[twilio] …` console wrapper mirroring Python's `logging.getLogger("scenario.voice.twilio")`. Same log sites as the Python parity — body-cap violation, signature rejection, disallowed-caller reject, DTMF receipt, onDtmf callback error (concerns #1 + #14). - twilio-shared.ts: drop duplicate `PCM16_SAMPLE_WIDTH = 2`; import the canonical `PCM16_SAMPLE_WIDTH_BYTES` from `../audio-chunk` and rename call sites (concern #3). - twilio.ts: drop dead `UnsupportedCapabilityError` import + the `export type` re-export that papered over its unused state — base class re-exports via voice/index.ts already (concern #12). - twilio-tunnel.test.ts: wrap cucumber binding in `if (TUNNEL_ENABLED)`; on CI fall back to `describe.skip(...)` with a single placeholder `it` so the runner reports one skipped block instead of five vacuous greens (concern #5). Deferred (documented as follow-ups, not addressed here): - Refactor adapter↔server coupling into a `MediaStreamSession` value object (concern #2). Bigger architectural change; PR3+ executor wiring will exercise the seam first. - Migrate `makeDeferred` to `Promise.withResolvers()` (concern #9). - Replace `rejectedCount` instance field with `getStats()` snapshot (concern #11) — depends on the logger module's contract solidifying. - `call()` Liskov tension (concern #13) — same PR3+ wiring scope. Test surface: 33 passed + 1 skipped (was 27); full suite 409 passed + 1 skipped, build + typecheck green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
rogeriochaves
pushed a commit
that referenced
this pull request
Jun 2, 2026
…ig (keystone) New voice/config.ts (EDR §0.1 Tier 1 + ADR-002). The keystone of the per-run state model — replaces both the STT module-global (Gap #1) and configure({stt}) (Gap #2): - VoiceConfig { stt?: STTProvider | SttConfig; tts?: TtsConfig; defaultAudioFormat?; audioPlayback?; include{Audio,Timeline,Traces}? } - SttConfig { model; language?; apiKey? }, TtsConfig { voice; format?; apiKey? } - ResolvedVoiceConfig — stt always a concrete provider; the resolved per-run object - resolveVoiceConfig(optionLevel, scenarioLevel, defaults?): two-tier merge with the RunOptions.voice override in front of ScenarioConfig.voice, then pure defaults; `stt` resolves `options?.voice?.stt ?? cfg.voice?.stt ?? new OpenAISTTProvider()` (the default provider constructed per-run — pure default, not shared state). - DEFAULT_STT_MODEL, DEFAULT_AUDIO_FORMAT ("pcm16", the AI-SDK file part per §4.2). stt accepts an STTProvider instance (BYO) or an SttConfig descriptor (routed via resolveSttProvider). AudioFormat is a string union (nothing consumes a richer record yet; AudioChunk fixes 24kHz mono). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
rogeriochaves
pushed a commit
that referenced
this pull request
Jun 2, 2026
…gure() for global exec
Per EDR §0.1 + ADR-002 + PRD §4.7:
- config/configure.ts: removed the invented `configure({ stt })` provider knob
(present in no other PR, not in Python). `configure()` now carries only global
*execution* settings — `audioPlayback` (PRD §4.7: stream conversation audio to
local speakers). Stored in a module record read by the runner; getGlobalSettings()
exposes it. (audioPlayback is a genuine global UX toggle, not per-run provider
state — the ADR-001 concern is provider/model state flowing into call(), which
this is not.)
- configure.test.ts: rewritten to test the audioPlayback surface + a @ts-expect-error
asserting `stt` is no longer accepted.
- index.ts: updated the stale `configure({ stt })` comment; configure export stays.
Provider config is per-run via run({ voice: { stt, tts } }), not global.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
rogeriochaves
pushed a commit
that referenced
this pull request
Jun 2, 2026
… cascades Gaps #1/#2/#3/#7 + host wiring done; #4 verified intact. Final tsc/test state, remaining 29 SALVAGE markers, Tier B/C cascades (twilio-shared as critical-path blocker, composable de-dup now owed), and intentional EDR deviations recorded. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
rogeriochaves
pushed a commit
that referenced
this pull request
Jun 2, 2026
The bundled Pipecat bot emits a canned greeting on the `connected` event. The old user-first script let that greeting collide with the user's opener, so the first barge-in cut off the GREETING (not a substantive reply) and the bot answered a stale topic — caught by listening to the recording. Open with scenario.agent() to capture the greeting as its own turn (same shape as angry_customer / basic_greeting / random_interruptions), then drive a 2FA walk-through interrupted mid-reply to pivot to a password reset (barge-in #1), then interrupted again to ask for brevity (barge-in #2). Each barge-in now cuts off a SUBSTANTIVE reply and the conversation coheres. Criteria updated to encode the new promise; CODE assertions (transcriptTruncated, fired_after_speech, ratio<0.8, recovery-after-interrupt) unchanged. Intentionally diverges from the user-first Python twin — documented in the file docstring. Regenerated recording (real bot, real OpenAI keys): 9 segments, greeting first, 2 truncated substantive replies, both fired_after_speech, byte-accurate manifest, full.wav 859758 bytes (<1MB). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
drewdrewthis
added a commit
that referenced
this pull request
Jun 4, 2026
Two /review must-fixes: 1. transcribe.test.ts had `void transcribeSegments(...).then(expect...)` inside a synchronous Then callback. The promise resolved after the step completed, so any assertion failure was silently swallowed by vitest. Made the Then async and awaited the call directly. 2. Doc-comment headers in stt/tts/transcribe.test.ts incorrectly cited `@ts-bound`. Updated to cite each file's actual tag (`@ts-stt`, `@ts-tts`, `@ts-transcribe`) so the next reader doesn't get misled. Note: transcribe.test.ts header already said `@ts-transcribe` correctly; only stt.test.ts and tts.test.ts needed updating. Reviewer convergence (3x on #1, 2x on #2): test + principles + hygiene + principles. Refs #516, #517, #513. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
drewdrewthis
added a commit
that referenced
this pull request
Jun 4, 2026
…rotocol tests Review pass on PR #536 surfaced four actionable concerns. Addressed: - **#1 (blocking) — `connect()` left WS without `error`/`close` handlers after `onOpen` called `removeAllListeners()`.** An unhandled `error` on a Node EventEmitter crashes the process. Re-attach `message` + `error` + `close` listeners atomically post-open. The new `error` handler nulls `this.ws` so subsequent `sendAudio`/`receiveAudio` fail fast instead of writing to a dead socket. Pending receivers drain to empty `AudioChunk` so the executor unwinds rather than hanging. - **#2 (blocking) — `onMessage` branches were untested.** Added 14 wire-protocol unit tests (plain vitest, not cucumber-bound) covering: base64 PCM16 decode, odd-byte trim invariant, audio queue/waiter FIFO, ping → pong with `event_id`, ping defensive (no `event_id` skip), `user_transcript` capture, `agent_response` capture, `agent_response_correction` override, format-drift warning, interruption + unknown event swallow, non-JSON frames ignored, post-open socket error drain, socket close drain, and `receiveAudio` timeout. - **#3 — Default LLM identifier was inlined in `eleven-labs-voice-agent.ts`, violating `voice-models.ts`'s self-declared single-source-of-truth contract.** Hoisted `COMPOSABLE_VOICE_LLM_MODEL` + `ELEVENLABS_DEFAULT_VOICE_ID` + `ELEVENLABS_TTS_MODEL` + `ELEVENLABS_STT_MODEL` into `voice-models.ts` (Python parity: `python/scenario/config/voice_models.py`). Adapters now import from there. - **#6 — `receiveAudio` referenced `waiter` from inside the timer body before its `const` declaration.** Worked by event-loop ordering; fragile to refactor. Forward-declared `let timer` and put `waiter` ahead of the `setTimeout` so the dependency graph is explicit. Tests: 411 / 22 files passing (previously 397 / 22; +14 wire-protocol tests). Build: tsup CJS + ESM + DTS clean. Deferred (intentional, tracked in PR body): - #4/#5: inline `pcm16ToWavBytes` + `synthesize` helpers — duplicate-by-design with PR2 (#513); merge-order constraint. - #7: `turnOutputEmitted` latch contract with PR3 executor — surface in PR3 review. - #8: distinguish natural end-of-turn from socket close — design-level, needs PR3 design conversation. - #9: `featurePath()` helper — extract once a 3rd test file would duplicate the climb. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
drewdrewthis
added a commit
that referenced
this pull request
Jun 4, 2026
…cases Addresses 5 review concerns (review #540 synthesizer pass): - #1 perf: receive-side mulaw buffer now stores Uint8Array slices, not number[]; bufferMulaw is O(1) per call instead of O(n) per byte. - #2 docs: coerceFrameToText's 0x7b/0x5b heuristic is now documented as a known rare-collision risk (binary µ-law with first byte == { or [ would mis-route to JSON parser and silently drop). - #4 test pyramid: round-trip scenario re-tagged @Unit (FakeWebSocket = no network) — real-WSS @integration demo deferred behind env-gated bot endpoint per /browser-qa note. - #5 coverage: 2 new edge-case tests for partial-buffer flush on bot-sent `stop` event and on socket-close. Not addressed in this PR (filed as follow-up considerations): - #3 vestigial audioFormat/sampleRate fields (inherited from Python parity) - #6 DTMF/E.164 validation regex port (pre-requisite for PR11 Twilio) - #8 extract TwilioMediaStreamsTransport helper (PR11 prep) - #9 JSON-frame size cap (no regression vs main; same constraint as Python) - #10 FakeWebSocket vs node:events (cosmetic)
drewdrewthis
added a commit
that referenced
this pull request
Jun 4, 2026
…g-safe compare, coverage Addresses 8 of the 13 actionable items from the /review fanout: Security: - twilio-server.ts: cap webhook body at 1 MB via streaming guard; reject with HTTP 413 instead of accumulating into memory (concern #7). - twilio-shared.ts: replace hand-rolled XOR signature compare with `crypto.timingSafeEqual` on decoded base64 buffers — Node-stdlib primitive, no DIY constant-time math (concern #10). - twilio-tunnel.ts: drop `(0, eval)("(name) => import(name)")` indirect; use bare dynamic `import()` in try/catch on ERR_MODULE_NOT_FOUND so bundlers and security scanners can analyze the path (concern #8). Coverage (the highest-risk port-only LOC was untested): - twilio.test.ts: codec round-trip — 100 ms 440 Hz sine wave through pcm16/24k → mulaw/8k → pcm16/24k, average abs sample-diff < 2000 (under 10 % of peak). Plus empty-input case. - twilio.test.ts: `verifyTwilioSignature` valid-signature accept, wrong-token reject, wrong-URL reject, missing-signature reject. - twilio.test.ts: `validateE164` + `validateDtmf` accept/reject + the TwiML-injection payload the docstring warns about. - twilio.test.ts: `onDtmf` callback fires on `dtmf` frame, `allowedCallers` filter rejects + records, stop-frame flush enqueues a final AudioChunk. Observability + boy-scout: - twilio-logger.ts (new): minimal `[twilio] …` console wrapper mirroring Python's `logging.getLogger("scenario.voice.twilio")`. Same log sites as the Python parity — body-cap violation, signature rejection, disallowed-caller reject, DTMF receipt, onDtmf callback error (concerns #1 + #14). - twilio-shared.ts: drop duplicate `PCM16_SAMPLE_WIDTH = 2`; import the canonical `PCM16_SAMPLE_WIDTH_BYTES` from `../audio-chunk` and rename call sites (concern #3). - twilio.ts: drop dead `UnsupportedCapabilityError` import + the `export type` re-export that papered over its unused state — base class re-exports via voice/index.ts already (concern #12). - twilio-tunnel.test.ts: wrap cucumber binding in `if (TUNNEL_ENABLED)`; on CI fall back to `describe.skip(...)` with a single placeholder `it` so the runner reports one skipped block instead of five vacuous greens (concern #5). Deferred (documented as follow-ups, not addressed here): - Refactor adapter↔server coupling into a `MediaStreamSession` value object (concern #2). Bigger architectural change; PR3+ executor wiring will exercise the seam first. - Migrate `makeDeferred` to `Promise.withResolvers()` (concern #9). - Replace `rejectedCount` instance field with `getStats()` snapshot (concern #11) — depends on the logger module's contract solidifying. - `call()` Liskov tension (concern #13) — same PR3+ wiring scope. Test surface: 33 passed + 1 skipped (was 27); full suite 409 passed + 1 skipped, build + typecheck green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
drewdrewthis
added a commit
that referenced
this pull request
Jun 4, 2026
…ig (keystone) New voice/config.ts (EDR §0.1 Tier 1 + ADR-002). The keystone of the per-run state model — replaces both the STT module-global (Gap #1) and configure({stt}) (Gap #2): - VoiceConfig { stt?: STTProvider | SttConfig; tts?: TtsConfig; defaultAudioFormat?; audioPlayback?; include{Audio,Timeline,Traces}? } - SttConfig { model; language?; apiKey? }, TtsConfig { voice; format?; apiKey? } - ResolvedVoiceConfig — stt always a concrete provider; the resolved per-run object - resolveVoiceConfig(optionLevel, scenarioLevel, defaults?): two-tier merge with the RunOptions.voice override in front of ScenarioConfig.voice, then pure defaults; `stt` resolves `options?.voice?.stt ?? cfg.voice?.stt ?? new OpenAISTTProvider()` (the default provider constructed per-run — pure default, not shared state). - DEFAULT_STT_MODEL, DEFAULT_AUDIO_FORMAT ("pcm16", the AI-SDK file part per §4.2). stt accepts an STTProvider instance (BYO) or an SttConfig descriptor (routed via resolveSttProvider). AudioFormat is a string union (nothing consumes a richer record yet; AudioChunk fixes 24kHz mono). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
drewdrewthis
added a commit
that referenced
this pull request
Jun 4, 2026
…gure() for global exec
Per EDR §0.1 + ADR-002 + PRD §4.7:
- config/configure.ts: removed the invented `configure({ stt })` provider knob
(present in no other PR, not in Python). `configure()` now carries only global
*execution* settings — `audioPlayback` (PRD §4.7: stream conversation audio to
local speakers). Stored in a module record read by the runner; getGlobalSettings()
exposes it. (audioPlayback is a genuine global UX toggle, not per-run provider
state — the ADR-001 concern is provider/model state flowing into call(), which
this is not.)
- configure.test.ts: rewritten to test the audioPlayback surface + a @ts-expect-error
asserting `stt` is no longer accepted.
- index.ts: updated the stale `configure({ stt })` comment; configure export stays.
Provider config is per-run via run({ voice: { stt, tts } }), not global.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
drewdrewthis
added a commit
that referenced
this pull request
Jun 4, 2026
… cascades Gaps #1/#2/#3/#7 + host wiring done; #4 verified intact. Final tsc/test state, remaining 29 SALVAGE markers, Tier B/C cascades (twilio-shared as critical-path blocker, composable de-dup now owed), and intentional EDR deviations recorded. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
drewdrewthis
added a commit
that referenced
this pull request
Jun 4, 2026
The bundled Pipecat bot emits a canned greeting on the `connected` event. The old user-first script let that greeting collide with the user's opener, so the first barge-in cut off the GREETING (not a substantive reply) and the bot answered a stale topic — caught by listening to the recording. Open with scenario.agent() to capture the greeting as its own turn (same shape as angry_customer / basic_greeting / random_interruptions), then drive a 2FA walk-through interrupted mid-reply to pivot to a password reset (barge-in #1), then interrupted again to ask for brevity (barge-in #2). Each barge-in now cuts off a SUBSTANTIVE reply and the conversation coheres. Criteria updated to encode the new promise; CODE assertions (transcriptTruncated, fired_after_speech, ratio<0.8, recovery-after-interrupt) unchanged. Intentionally diverges from the user-first Python twin — documented in the file docstring. Regenerated recording (real bot, real OpenAI keys): 9 segments, greeting first, 2 truncated substantive replies, both fired_after_speech, byte-accurate manifest, full.wav 859758 bytes (<1MB). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
drewdrewthis
added a commit
that referenced
this pull request
Jun 4, 2026
…561) * docs(#372): voice internal design record + ADR-002 (per-run provider state) Engineering Design Record for the TypeScript voice port (#372): the inside-the-box design the PRD (API proposal) never specified. Pairs the module tree + per-module contract catalog (target vs as-built gap analysis across the voice PR series) with ADR-002, which moves STT/TTS provider state off a module-global singleton onto per-run ScenarioConfig.voice (the only per-run carrier that reaches AgentAdapter.call), removes the invented scenario.configure({stt}) surface, and standardizes one in-message audio format (fixing a live WAV-vs-PCM decode mismatch). Spec only — no runtime change. The clean voice stack is built against this. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(typescript-sdk/#372): voice TTS + STT plumbing (PR2 of N) Ports python/scenario/voice/{tts,stt,_transcribe}.py to TypeScript and exposes scenario.configure({ stt }) for swapping the default STT provider. - voice/tts.ts: synthesize(text, voice, effectFn?) + LRU(64) keyed on sha256(text)+voice. Effects apply AFTER cache hit per the locked decision; raw text never reaches the cache payload. - voice/stt.ts: STTProvider interface, OpenAISTTProvider default (gpt-4o-transcribe) with 25-minute chunking, ElevenLabsSTTProvider, setSttProvider / getSttProvider for swap. Pure-TS pcm16-to-wav encoder — no transcription-only ffmpeg dep. - voice/transcribe.ts: transcribeSegments — post-hoc, idempotent per-segment, degrades gracefully when no provider is configured. - config/configure.ts: scenario.configure({ stt }) entry point. Tests in follow-up commits. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(typescript-sdk/#372): bind 7 voice TTS+STT scenarios in vitest - tts.test.ts: cache key is (sha256(text), voice); effects apply AFTER cache hit (third call with new effect reads ORIGINAL cached PCM, not effect-baked bytes). - stt.test.ts: default model = gpt-4o-transcribe; provider swap via setSttProvider; STTProvider interface minimal (no OpenAI types leak); >25-min audio splits into sub-chunks with concatenated transcripts. - transcribe.test.ts: transcribeSegments fills missing transcripts in place, skips already-filled segments; missing STT degrades gracefully with a warning and never raises. - configure.test.ts: scenario.configure({ stt }) round-trips a custom provider; null clears. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(typescript-sdk/#372): bind 7 voice TTS+STT scenarios via vitest-cucumber Retrofits PR #513's hand-rolled tests so the 7 scenarios they claim to cover actually load and execute against specs/voice-agents.feature via @amiceli/vitest-cucumber, matching the pattern landed by #517. Scenarios tagged @ts-tts, @ts-stt, @ts-transcribe (domain-specific sub-tags alongside @unit) so each test file's includeTags filter targets exactly the scenarios it owns without disturbing voice-contract-surface.test.ts (which uses @ts-bound for the original 5 scenarios from PR1). - tts.test.ts: loadFeature + describeFeature({ includeTags: ["ts-tts"] }) binding "TTS cache key is (text, voice) only and effects apply after cache hit" - stt.test.ts: loadFeature + describeFeature({ includeTags: ["ts-stt"] }) binding 4 STT scenarios: default gpt-4o-transcribe, provider swap, minimal interface, >25-min chunking - transcribe.test.ts: loadFeature + describeFeature({ includeTags: ["ts-transcribe"] }) binding transcribe_segments fills-in-place + missing STT degrades gracefully Refs #516 (spec-binding retrofit), #517 (PR-A, merged), #372 (slice plan). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(test): await floating promise; align doc headers with actual tags Two /review must-fixes: 1. transcribe.test.ts had `void transcribeSegments(...).then(expect...)` inside a synchronous Then callback. The promise resolved after the step completed, so any assertion failure was silently swallowed by vitest. Made the Then async and awaited the call directly. 2. Doc-comment headers in stt/tts/transcribe.test.ts incorrectly cited `@ts-bound`. Updated to cite each file's actual tag (`@ts-stt`, `@ts-tts`, `@ts-transcribe`) so the next reader doesn't get misled. Note: transcribe.test.ts header already said `@ts-transcribe` correctly; only stt.test.ts and tts.test.ts needed updating. Reviewer convergence (3x on #1, 2x on #2): test + principles + hygiene + principles. Refs #516, #517, #513. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(typescript-sdk/#372): voice adapter runtime + executor wiring + VAD fallback (WIP) PR3 of N for langwatch/scenario#372. Builds on PR1 (#511) types. - Port `python/scenario/voice/adapter.py` runtime to `voice/adapter.runtime.ts`: * `asyncio.Event` -> `AgentSpeakingEvent` (Promise + resolve ref) * `async with` -> explicit `startVoiceAdapters` / `stopVoiceAdapters` * Default `call()` body: send -> drain on tail silence -> record -> return * Hook fan-out for `onAudioChunk` / `onVoiceEvent` - Port `python/scenario/voice/vad.py` -> `voice/vad.ts`: * `WebRTCVadFallback` with one-shot warning per adapter (matches Python `_warned_adapters` memoisation, no rate-limit regression) * Activates only when `adapter.capabilities.nativeVad === false` * Pure-TS RMS energy + hysteresis detector ships today; webrtcvad C-library build pipeline is the decision-pending item. - Patch `execution/scenario-execution.ts`: * Implement `VoiceExecutorState` structurally (Decision 1(b) from #372) * Pick voice adapters at run start; connect inside try, disconnect in finally so the spec-148-145 "regardless of pass/fail/exception" contract holds. * Wire `onAudioChunk` / `onVoiceEvent` from `ScenarioConfig`. - Add `voice/__tests__/fixtures/fake-adapter.ts`: in-memory adapter, no real transport. Tests use this exclusively. - Tests (vitest, bound to `specs/voice-agents.feature`): * `adapter-lifecycle.test.ts` lines 138-145 * `hooks.test.ts` lines 449-461 * `vad-fallback.test.ts` lines 772-791 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(typescript-sdk/#372): re-attach voice executor ref after reset(); fail-on-call fixture - ScenarioExecution.reset() recreated ScenarioExecutionState, losing the setExecutor linkage from the constructor. Voice adapters reaching input.scenarioState._executor would see null for the rest of the run, so hook fan-out / recorder never wrote into voice state. Re-attach in reset() so the linkage survives. - FakeVoiceAdapter gains a failOnCall option — cleaner than spawning a second AGENT-role agent that would compete with the fake adapter for the agent() step (the executor picks the first role-matching agent). - All 4 voice test files now green (21/21 voice tests, 381/381 total). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(typescript-sdk/#372): bind voice adapter+hooks+VAD scenarios via vitest-cucumber Retrofits PR #515's hand-rolled tests for adapter lifecycle, hooks, and VAD fallback to actually load and execute specs/voice-agents.feature via @amiceli/vitest-cucumber, matching the pattern landed by #517 and #513. Tags by test file (per-file tagging needed because vitest-cucumber v6 fails the suite for scenarios that match a file's includeTags but aren't bound in that file): - @ts-adapter: connect/disconnect fires per-scenario - @ts-hooks: on_audio_chunk and on_voice_event fire - @ts-vad: VAD fallback / native-VAD does not trigger / one-shot warning Key implementation note: vitest-cucumber v6 runs each Given/When/Then step as a separate vitest it(). Module-level beforeEach/afterEach hooks fire around each step, not around the whole scenario. For scenarios that need to assert on console.warn calls across step boundaries, the spy is installed locally within the When step and captured warn messages are carried via closure-scoped variables into Then/And — avoiding the floating-promise and spy-reset antipatterns. Refs #516 (spec-binding retrofit), #517 (PR-A, merged), #513 (PR-B, ready for review), #372 (slice plan). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(test/#515): use BeforeEachScenario; split packed scenarios Three /review must-fixes: 1. vad-fallback.test.ts: replaced the closure-capture spy pattern with the library's BeforeEachScenario/AfterEachScenario hooks. The coder's earlier workaround was based on the false belief that vitest-cucumber lacked scenario-level lifecycle hooks. The hooks exist (verified at @amiceli/vitest-cucumber 6.5.0 describe-feature.js:311-322). BeforeEachScenario fires via beforeAll inside the scenario describe block — once per scenario, not per step. Spy is shared; capturedWarnCalls accumulates across steps within the same scenario. Removed ~28 lines of SPY STRATEGY prose comments. 2. hooks.test.ts: extracted the "throwing hook doesn't break scenario" check from inside the on_voice_event scenario's When step. It was asserting behavior the bound feature scenario didn't claim. Now a plain it() block outside describeFeature. Option (a) chosen: no spec scenario exists for this behavior in voice-agents.feature. 3. adapter-lifecycle.test.ts: split 5 sub-cases out of one packed And step. Kept only the happy-path disconnect assertion in the bound And step (disconnect fires once on success). Lifted fail/throw/ multi-adapter/disconnect-swallow to 4 plain it() blocks. Option (b) chosen: specs/voice-agents.feature line 143 names the And step as a single AC ("regardless of pass/fail/exception") — the 4 sub-cases are implementation-level guarantees not individually specced. Reviewer convergence: principles + test (3x). Refs #516, #517, #513, #515. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(typescript-sdk/#372): voice-aware UserSimulatorAgent + judge + audio messages (PR4 of N) Ports the python voice path for simulator and judge to TypeScript: - javascript/src/voice/messages.ts: createAudioMessage/extractAudio/ messageHasAudio helpers using the local AudioMessageParam type. No openai package import — uses messages.types.ts (Decision 2(b)). - javascript/src/agents/user-simulator-agent.ts: voice config triggers audio-message emission; per-step voice + per-step audio_effects + persona composition. stripAudioContent keeps LLM calls text-only. - javascript/src/agents/judge/judge-agent.ts: JudgeAgent exported as class with static conversationHasAudio; effectiveIncludeAudio/Timeline/Traces helpers; auto-detect multimodal model via model name substrings; include_audio=false escape hatch. 13 scenarios bound to specs/voice-agents.feature via vitest-cucumber: - 5 simulator scenarios (@ts-simulator) - 7 judge scenarios (@ts-judge) - 1 assistant-role scenario (@ts-assistant-role) Tag convention: per-subject (@ts-simulator / @ts-judge / @ts-assistant-role) instead of @ts-bound to avoid colliding with PR1's voice-contract-surface test (which uses includeTags: ["ts-bound"] and would over-match new scenarios). Per-file tagging is established by #513/#515; tag-convention decision tracked at #523. Refs #372 (slice plan), #517 (PR1 infra, merged), #513 (PR2, ready), Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(test/#528): drop voiceStyle override binding, split packed Thens, minor cleanups /review surfaced 4 Must-Fix carry-forwards from prior PRs: 1. "Per-step voice override applies to only that step" scenario asserts no observable behavior — voiceStyle is set/cleared via setOneShotOverride but no TTS provider honors it. Spec retagged @todo (removed @ts-simulator) so future PRs that wire voiceStyle into _synthesize can re-bind. Test block removed. Honest absence beats paraphrase-as-binding. PR4 now binds 12 scenarios (was 13). 2. voice-assistant-role.test.ts doc-comment claimed @integration but feature file tags @unit. Fixed. Also fixed an internal comment that said "Python SDK" when the context was "TS SDK". 3. judge-voice.test.ts had 4-5 packed Then blocks (multi-model sub-cases stuffed into single bound Thens). Lifted sub-cases to plain it() blocks outside describeFeature; bound Thens now assert only spec-named behavior. 4. Hoisted mid-file zod import to top of judge-agent.ts. Reviewer convergence: principles, hygiene, test. Refs #528, #516, #372. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(typescript-sdk/#372): voice script steps + interruption + result extensions (PR5 of N) PR5 of the TS voice parity slice. Pure SDK orchestration — no external service is touched, no UI runs. Wires the script-step DSL, interruption config, recording runtime, and the optional ScenarioResult voice fields behind the same contract surface the Python SDK already ships. Adds: * javascript/src/script/voice-steps.ts — sleep, silence, audio, dtmf, interrupt (after-time + after-words), agent({ wait: false }), proceed({ interruptions, onTurn, onStep }), backgroundNoise. Imports from `@langwatch/scenario` script barrel as `voiceAgent` / `voiceProceed` so the existing positional `agent`/`proceed` stay untouched for callers. * javascript/src/voice/interruption.ts — InterruptionConfig class with shouldInterrupt / sampleDelay / pickRandomPhrase. RNG-pluggable so callers can pass a seeded PRNG for deterministic tests. CONTEXTUAL_PROMPT exported as a module-level constant. * javascript/src/voice/recording.runtime.ts — VoiceRecordingRuntime with WAV writer (native; canonical PCM16/24kHz/mono RIFF header) and MP3/OGG/FLAC via system ffmpeg subprocess. saveSegments() writes the segments dir + full.wav + JSON manifest. computeLatencyMetrics() aggregates avg/p50/p95 with ceiling-style p95. * ScenarioResult gains optional `audio`/`timeline`/`latency` fields — text-only runs leave them undefined (back-compat preserved). Test files (all bound via vitest-cucumber against specs/voice-agents.feature): * src/script/__tests__/voice-steps.test.ts (11 scenarios, @ts-script-step) * src/voice/__tests__/interruption.test.ts (1 bound + 2 unit, @ts-interruption-cfg) * src/voice/__tests__/recording.runtime.test.ts (7 unit — not feature-bound) * src/voice/__tests__/result-extensions.test.ts (6 scenarios, @ts-result-ext) Spec tags: @ts-script-step / @ts-interruption-cfg / @ts-result-ext sub-tags scope each PR5 file's binding set; voice-contract-surface.test.ts now uses excludeTags to keep ownership of the PR1 contract-surface set only. Tsconfig: target=ES2022 so top-level await (vitest-cucumber pattern) and `Set` iteration land without --downlevelIteration shims. ffmpeg distribution decision pending — see PR body for options. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(voice/#372): replace private-attr indirection with typed surfaces Addresses /review concerns on PR5: - Lift voiceInterruptions + voiceBackgroundNoise onto VoiceExecutorState so voiceProceed/backgroundNoise write through the same typed contract the voice subsystem already commits to (Decision 1(b) of #372). Drops three `as unknown as { _voice* }` indirections from voice-steps.ts. - Expose agentSpeakingEvent + streamingTranscript + sendDtmf on VoiceAgentAdapter as optional/abstractable members. dtmf() now calls adapter.sendDtmf() directly — adapters that claim capabilities.dtmf while skipping the method get a loud UnsupportedCapabilityError from the base class instead of a silent PCM synthesizer fallback. - Add bounded timeout to waitForStreamingWords so a wedged adapter that never advances its transcript can't lock the script forever (mirrors waitForAgentSpeaking's pattern). - audio() URL_LIKE error message no longer suggests "download the asset locally" when the input is already a file:// URI. - recording.runtime.test.ts skips MP3 transcoding cleanly when ffmpeg is not on PATH (itIfFfmpeg guard). - Drop the unused DTMF PCM-synth fallback now that capability-method coupling is enforced at the base class. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(typescript-sdk/#372): voice effects module + bundled noise assets (PR6 of N) Ports python/scenario/voice/effects/* to javascript/src/voice/effects/*: - common.ts (EffectFn type, PCM16 <-> Int16Array helpers) - noise.ts (backgroundNoise, static_, multipleVoices) + 5 bundled WAVs - prosody.ts (lowVolume, highVolume, speakingFast, speakingSlow) - quality.ts (phoneQuality via fft.js, lowQuality, packetLoss, echo, robotic, breakingUp) - custom.ts (user-fn wrapper with type validation) - index.ts barrel re-exporting static_ as static Adds fft.js dep (FFT for phoneQuality bandpass). Updates tsup.config.ts to cpSync src/voice/assets to dist/voice/assets; package.json files includes src/voice/assets/** so WAVs ship in published npm package. Bundle delta ~132KB (5 x 24KB WAVs + LICENSES) — under the 1MB budget. Binds 5 scenarios in specs/voice-agents.feature with tag @ts-effects (per-subject tag, NOT @ts-bound, to avoid collision with PR #517's voice-contract-surface.test.ts that already owns @ts-bound; follows PR #528 convention from issue #523). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(voice/#372): address PR #537 review — public API + cleanups Review fanout flagged: - effects unreachable via voice namespace (voice/index.ts had no re-export) - TS2802 on [...BACKGROUND_PRESETS].sort() (Set iteration) - require('fft.js') with manual type cast + eslint suppression - conjugate-symmetry mirror hand-rolled instead of fft.completeSpectrum() - 3 near-identical linearResample loops across noise/prosody/quality - double static_/static export (pick one for the public name) Fixes: - voice/index.ts: export * as effects from './effects' - effects.test.ts: regression assertion via voice namespace import - noise.ts: Array.from() instead of spread; use linearResample helper - quality.ts: import FFT from 'fft.js'; fft.completeSpectrum(); linearResample x2 - prosody.ts: linearResample helper - common.ts: new linearResample(arr, newLen): Int16Array - effects/index.ts: drop bare static_ re-export, keep only static alias - effects.test.ts: JSDoc note that on_turn Scenario binding is a unit-level proxy for the runtime hook that lands in PR3 (#515) pnpm -C javascript build: green pnpm -C javascript test: 22 files / 392 tests pass pnpm -C javascript typecheck: pre-existing TS1378 from PR #517 only; no new errors. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(voice/effects): broaden public-API regression; unify resample idiom Review nits from re-review of PR #537: - public-API surface test asserted only 3 callables; iterate all 14 §4.5 effects so a missing barrel re-export fails fast. - prosody._resampleFactor wrapped linearResample with int16ToPcm16 while quality.lowQuality used `new Uint8Array(buf.buffer)`. The clip in int16ToPcm16 is a no-op on Int16Array input — use the zero-copy view in both places. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(typescript-sdk/#372): voice ElevenLabs adapter + composable + branded (PR7 of N) PR7 of issue #372 — the first real voice transport. Ports three Python adapters to TS and binds 7 scenarios in `specs/voice-agents.feature`. What lands: - `javascript/src/voice/adapters/elevenlabs.ts` — `ElevenLabsAgentAdapter`, the hosted ConvAI adapter. Connects to `wss://api.elevenlabs.io/v1/convai/conversation` via the `ws` package; PCM16/24kHz base64-over-JSON; full event handling (audio, ping, transcript, correction, init-metadata, interruption). Mirrors `python/scenario/voice/adapters/elevenlabs.py`. - `javascript/src/voice/adapters/composable.ts` — `ComposableVoiceAgent` + `STTProvider` interface + `ElevenLabsSTTProvider` + inline `synthesize` helper (elevenlabs/ provider only — PR2 #513 supplies the rest). LLM is any ai-sdk `LanguageModel`. Mirrors `python/scenario/voice/adapters/composable.py`. - `javascript/src/voice/adapters/eleven-labs-voice-agent.ts` — `ElevenLabsVoiceAgent`, the branded preset. Provider-typed options; defaults to `ElevenLabsSTTProvider` + `openai("gpt-5.4-mini")` + `elevenlabs/EXAVITQu4vr4xnSDxMaL` (Sarah — free-tier premade); each piece independently overridable. `eleven_v3` TTS model hardcoded for paralinguistic-marker support (per Python tts.py:107 comment). Tests: - `javascript/src/voice/adapters/__tests__/elevenlabs.test.ts` — 5 unit scenarios bound via `describeFeature(..., { includeTags: [["unit", "ts-elevenlabs"]] })`. - `javascript/examples/vitest/tests/voice/elevenlabs-hosted.test.ts` — 2 e2e scenarios env-gated on `ELEVENLABS_API_KEY` (+ `ELEVENLABS_AGENT_ID` for the hosted demo). Without keys, the suite cleanly skips. Tag convention: `@ts-elevenlabs` (per-subject) rather than `@ts-bound` — per the precedent from PRs #517 / #528 (`@ts-simulator`, `@ts-judge`, `@ts-assistant-role`), per-subject tags avoid the `checkUncalledScenario` collision with PR1's contract-surface test. See #523 for the tag-convention decision. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(voice/#372): address review concerns 1/3/6 + add onMessage wire-protocol tests Review pass on PR #536 surfaced four actionable concerns. Addressed: - **#1 (blocking) — `connect()` left WS without `error`/`close` handlers after `onOpen` called `removeAllListeners()`.** An unhandled `error` on a Node EventEmitter crashes the process. Re-attach `message` + `error` + `close` listeners atomically post-open. The new `error` handler nulls `this.ws` so subsequent `sendAudio`/`receiveAudio` fail fast instead of writing to a dead socket. Pending receivers drain to empty `AudioChunk` so the executor unwinds rather than hanging. - **#2 (blocking) — `onMessage` branches were untested.** Added 14 wire-protocol unit tests (plain vitest, not cucumber-bound) covering: base64 PCM16 decode, odd-byte trim invariant, audio queue/waiter FIFO, ping → pong with `event_id`, ping defensive (no `event_id` skip), `user_transcript` capture, `agent_response` capture, `agent_response_correction` override, format-drift warning, interruption + unknown event swallow, non-JSON frames ignored, post-open socket error drain, socket close drain, and `receiveAudio` timeout. - **#3 — Default LLM identifier was inlined in `eleven-labs-voice-agent.ts`, violating `voice-models.ts`'s self-declared single-source-of-truth contract.** Hoisted `COMPOSABLE_VOICE_LLM_MODEL` + `ELEVENLABS_DEFAULT_VOICE_ID` + `ELEVENLABS_TTS_MODEL` + `ELEVENLABS_STT_MODEL` into `voice-models.ts` (Python parity: `python/scenario/config/voice_models.py`). Adapters now import from there. - **#6 — `receiveAudio` referenced `waiter` from inside the timer body before its `const` declaration.** Worked by event-loop ordering; fragile to refactor. Forward-declared `let timer` and put `waiter` ahead of the `setTimeout` so the dependency graph is explicit. Tests: 411 / 22 files passing (previously 397 / 22; +14 wire-protocol tests). Build: tsup CJS + ESM + DTS clean. Deferred (intentional, tracked in PR body): - #4/#5: inline `pcm16ToWavBytes` + `synthesize` helpers — duplicate-by-design with PR2 (#513); merge-order constraint. - #7: `turnOutputEmitted` latch contract with PR3 executor — surface in PR3 review. - #8: distinguish natural end-of-turn from socket close — design-level, needs PR3 design conversation. - #9: `featurePath()` helper — extract once a 3rd test file would duplicate the climb. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(typescript-sdk/#372): voice OpenAI Realtime adapter (agent + user roles) (PR8 of N) Port `python/scenario/voice/adapters/openai_realtime.py` to TypeScript at `javascript/src/voice/adapters/openai-realtime.ts`. The adapter owns the OpenAI Realtime wire protocol directly — the model IS the agent under test (`role=AgentRole.AGENT`) or the voice-enabled user simulator (`role=AgentRole.USER`, per §7.2 L1164-1171). User-role critical path: scripted `user("text")` lines call `sendText`, which emits `conversation.item.create` (`input_text` content) + `response.create` directly. TTS is bypassed — the realtime model owns prosody synthesis. Wire-protocol behavior: - WSS to `wss://api.openai.com/v1/realtime?model=<model>` via `ws` - `session.update` post-connect (pcm16/24000 in/out, voice, instructions, tools, server-side VAD off so we own turn boundaries) - `sendAudio` → `input_audio_buffer.append` (deferred commit) - `receiveAudio` → commit + response.create on first call, loops over events until `response.audio.delta`; transcript deltas update `lastAgentTranscript`, Whisper user transcripts update `lastUserTranscript` - `interrupt()` → `response.cancel` (first-class interrupt per §5.6) Scenarios bound (`specs/voice-agents.feature`): - @unit @ts-openai-realtime — agent connect + user-simulator wiring - @e2e @ts-openai-realtime-agent-demo — live agent-role round-trip - @e2e @ts-openai-realtime-user-demo — live user-simulator with sendText Per-subject tags avoid collision with PR1's `voice-contract-surface.test.ts` which uses `includeTags: ["ts-bound"]` (single-axis OR). Dual-axis filters `[["unit", "ts-openai-realtime"]]` keep unit binding tight. Tests: - `javascript/src/voice/adapters/__tests__/openai-realtime.test.ts` — 2 @unit scenarios driven against an in-process `ws` server (asserts wire-protocol shape, transcript accumulation, response.cancel, capability matrix). 7 step assertions pass. - `javascript/examples/vitest/tests/voice/openai-realtime-agent.test.ts` — agent-role e2e demo, env-gated on `OPENAI_API_KEY` via `Scenario.skip`. - `javascript/examples/vitest/tests/voice/openai-realtime-user.test.ts` — user-role e2e demo proving `sendText` is the TTS-free path. Dependencies: - Adds `ws` 8.20.1 + `@types/ws` 8.18.1 to the javascript workspace (Realtime WSS transport). /browser-qa-against-prod evidence env-gated: `OPENAI_API_KEY` UNSET in the grinder's environment so e2e demos report as skipped. CI gate runs them when the secret is configured. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * review: address /review concerns (apiKey check, url init, structural tools, sync disconnect) Surfaced by /review skill (PR #535): - **Sync disconnect:** `disconnect()` now eagerly rejects any in-flight `receiveAudio` waiter and flushes the event queue instead of relying on the async `close` handler. Prevents waiters from blocking past the close and stale-queued events from leaking into the next session. - **API key validation:** `connect()` throws a named diagnostic when no key is set, instead of letting the request surface as a generic WebSocket 401. - **`url` init knob:** `OpenAIRealtimeAgentAdapterInit.url` lets tests point at a loopback WS server without subclassing the adapter. The unit test now constructs the adapter directly — the `TestAdapter` subclass is gone. - **Structural tool type:** `tools: unknown[]` → `RealtimeToolDef[]` (exported), so call-site typos surface at compile time. Sets the template for the four remaining adapter ports. - **Single timeout site:** dropped the unreachable outer-loop deadline check in `receiveAudio` — `_nextEvent` already arms a per-iteration timer that fires the same error. - **PCM16 truncate removed:** the AudioChunk constructor already enforces even-byte invariant; adapter-side truncation was belt-and-suspenders that would hide an upstream codec bug. - **E2E agent demo:** moved the `expect(chunk).toBeInstanceOf(AudioChunk)` assertion from `When` into `Then` where it belongs. Deferred (out-of-scope or PR3 territory): - Logger surface for non-JSON frame drops (Python emits `logger.debug`; TS port has no logger yet — file when the SDK introduces one). - `responseTimeout` / `responseTailSilence` / `responseMaxDuration` are inherited from `VoiceAgentAdapter` but inert until PR3 wires the executor. PR3 must consume them. Gates re-validated: build green (CJS + ESM + DTS), 383/383 tests pass, eslint clean on touched files. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(voice/e2e): import OpenAI Realtime adapter via voice namespace CI failure root cause: `AudioChunk`, `OpenAIRealtimeAgentAdapter`, `OPENAI_REALTIME_MODEL`, `silentChunk` are exposed at the package root via `export * as voice from "./voice"` — they're NOT named exports on the root barrel. Direct named imports resolved to `undefined`, so `expect(firstChunk).toBeInstanceOf(AudioChunk)` saw `undefined` and `new OpenAIRealtimeAgentAdapter(...)` was a `TypeError`. Switched both e2e demos to destructure from the `voice` namespace and narrowed the local type aliases to `voice.AudioChunk` / `voice.OpenAIRealtimeAgentAdapter`. Unit tests are unaffected — they import from the local `../../index` re-export and never see the package root. CI was running the e2e demos because `OPENAI_API_KEY` IS configured in the CI env. Locally the same path skips (key unset). The skip-path test exit was a false positive — the actual binding consistency check needed the run path to fire. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(openai-realtime): drop deprecated Beta header (GA endpoint rejects it) CI surfaced the real issue: the OpenAI Realtime endpoint at `wss://api.openai.com/v1/realtime` is now GA and rejects the `OpenAI-Beta: realtime=v1` opt-in with: The Realtime Beta API is no longer supported. Please use /v1/realtime for the GA API. We were sending the header per Python parity (`python/scenario/voice/ adapters/openai_realtime.py`); the GA migration deprecates it. Dropped the header and updated the file-level docstring to document the choice. Python parity is intentionally broken here — Python adapter still sends the Beta header and will hit the same error. Track for back-port to keep the two SDKs aligned. Local: 383/383 unit tests pass, build green. CI re-run pending; e2e demos should now connect successfully against the GA endpoint. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(openai-realtime): migrate session.update to GA shape CI surfaced "Missing required parameter: 'session.type'" after the Beta-header drop — the GA Realtime API restructured the session config significantly (per RealtimeSessionCreateRequest in openai-node realtime.ts). Migrated session.update payload: - session.type: "realtime" (required discriminator) - session.model: passes the model id explicitly - audio formats moved under session.audio.{input,output}.format as { type: "audio/pcm", rate: 24000 } objects - voice moved under session.audio.output.voice - transcription + turn_detection nested under session.audio.input Unit test wire-shape assertions updated to match. Old shape fields (input_audio_format, output_audio_format, top-level voice, top-level turn_detection) are gone; the assertions now look at audio.input.format, audio.output.voice, etc. Python parity is intentionally broken here — the GA migration deprecates the wire surface Python uses. Track for back-port to keep the SDKs aligned. The Python adapter will hit the same error against the live endpoint. Local: 383/383 unit tests pass, build green (CJS + ESM + DTS). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(voice/e2e): GA voice + simplify agent-role smoke test Two CI issues after the GA wire-shape migration: 1. **Voice 'nova' is Beta-era, GA rejects it.** Supported voices are alloy/ash/ballad/coral/echo/sage/shimmer/verse/marin/cedar. Switched the user-role demo to `marin` (OpenAI's recommended modern voice). The BDD scenario text still names "nova" — that documents Python's parity intent; the test picks a valid GA voice. 2. **Agent-role demo deadlocks on silentChunk.** Sending 0.5s of silence to a Realtime session with `turn_detection: null` doesn't trigger the model; receiveAudio(20) times out and `chunk` stays null. The unit scenarios already prove the audio round-trip via a mock WS. The e2e demo's job is to prove live-endpoint connectivity, so rewrote it as a smoke test: - connect (GA handshake + session.update accepted) - interrupt (response.cancel round-trips against the live wire) - disconnect The Then assertion now verifies connectError is null and the capability matrix is published — wire health, not a model response. PR3 will drive real speech audio through the executor. Local: 383/383 unit tests pass. * fix(openai-realtime): handle GA audio event names CI: receiveAudio timed out after 81s on the user-role e2e demo. Root cause: GA renamed the streaming output events: Beta → GA response.audio.delta → response.output_audio.delta response.audio.done → response.output_audio.done response.audio_transcript.delta → response.output_audio_transcript.delta response.audio_transcript.done → response.output_audio_transcript.done The Beta names are no longer emitted by the live endpoint, so the receive loop never saw an audio frame. Updated the event matcher to accept both names. The new GA name wins on the live endpoint; the Beta alias keeps the existing unit tests (which push the legacy event names) working without churn, and makes back-port to any Beta-era endpoint trivial. Local: 383/383 tests pass. * feat(typescript-sdk/#372): voice Gemini Live adapter (PR9 of N) Ports python/scenario/voice/adapters/gemini_live.py → javascript/src/voice/adapters/gemini-live.ts using @google/genai (the new SDK; @google/generative-ai is the deprecated package). - GeminiLiveAgentAdapter with capabilities matrix (streaming transcripts, native VAD, interruption, pcm16/16000 in, pcm16/24000 out) - PCM16 24kHz↔16kHz resampler in pure JS (linear interpolation, no scipy) - Callback-to-queue bridge mapping the SDK's onmessage callback onto an awaitable receiveAudio(timeout) contract - @google/genai declared as optional peer dep; lazy-imported on connect() so the SDK ships without a hard Gemini coupling - 2 @unit scenarios (connect, capabilities matrix) bound via vitest-cucumber + 1 @e2e demo scenario (env-gated on GEMINI_API_KEY/GOOGLE_API_KEY) Refs #372. * fix(lint): reorder @langwatch/scenario import before vitest in e2e test * feat(typescript-sdk/#372): voice Pipecat adapter + g711 codec (PR10 of N) Ports python/scenario/voice/adapters/{pipecat.py,_twilio_shared.py} to TypeScript so voice scenarios can target a running Pipecat bot over the Twilio Media Streams WS protocol. WebRTC transport is deferred and raises PendingTransportError at connect() time. New files - src/voice/adapters/twilio-shared.ts — g711 µ-law 8 kHz ↔ PCM16 24 kHz codec + 24k/8k linear-interpolation resampler + Twilio Media Streams frame parser/builders. Reused by the upcoming TS Twilio adapter (PR11). - src/voice/adapters/pipecat.ts — PipecatAgentAdapter speaking the synthetic connected/start handshake, 20 ms µ-law media frames, clear for first-class interrupt, mark "utterance_end" as end-of-turn signal. - src/voice/adapters/pending-transport-error.ts — shared deferred- transport error class (parity with python _stub.PendingTransportError). - src/voice/adapters/__tests__/twilio-shared-codec.test.ts — binds the two @ts-codec scenarios (round-trip fidelity + sample-rate conversion) plus plain-vitest edge-case tests. - src/voice/adapters/__tests__/pipecat.test.ts — binds the three @ts-pipecat scenarios (WS round-trip, WebRTC PendingTransportError, clear-buffer interrupt) against a synchronous fake WebSocket. Capabilities advertised streamingTranscripts=true, nativeVad=true, dtmf=false, interruption=true, input/outputFormats=[pcm16/24000, mulaw/8000]. Notes for reviewers - 5 feature-file scenarios are bound (2 retagged, 3 new). Tag axis is @ts-pipecat / @ts-codec to match the @ts-<adapter> precedent set by PR #535 (OpenAI Realtime) and PR #536 (ElevenLabs). - /browser-qa-against-prod is env-gated on SCENARIO_PIPECAT_QA_WS_URL. CI does not set the var; documented under "/browser-qa note" in the PR body. No script ships in this PR — adding one would require a user-owned bot endpoint we don't have. - `ws` 8.20.1 + @types/ws 8.18.1 added as deps (matches PR #535). - tsconfig.target=ES2022 added (matches PR #535). * review fixes: receive buffer perf, binary-frame docs, test tag, edge cases Addresses 5 review concerns (review #540 synthesizer pass): - #1 perf: receive-side mulaw buffer now stores Uint8Array slices, not number[]; bufferMulaw is O(1) per call instead of O(n) per byte. - #2 docs: coerceFrameToText's 0x7b/0x5b heuristic is now documented as a known rare-collision risk (binary µ-law with first byte == { or [ would mis-route to JSON parser and silently drop). - #4 test pyramid: round-trip scenario re-tagged @unit (FakeWebSocket = no network) — real-WSS @integration demo deferred behind env-gated bot endpoint per /browser-qa note. - #5 coverage: 2 new edge-case tests for partial-buffer flush on bot-sent `stop` event and on socket-close. Not addressed in this PR (filed as follow-up considerations): - #3 vestigial audioFormat/sampleRate fields (inherited from Python parity) - #6 DTMF/E.164 validation regex port (pre-requisite for PR11 Twilio) - #8 extract TwilioMediaStreamsTransport helper (PR11 prep) - #9 JSON-frame size cap (no regression vs main; same constraint as Python) - #10 FakeWebSocket vs node:events (cosmetic) * feat(typescript-sdk/#372): voice Twilio adapter + tunnel harness (PR11 of N) Ports python/scenario/voice/adapters/{twilio,_twilio_server,_twilio_shared}.py to TypeScript: - `twilio-shared.ts` — µ-law/PCM16 codec (8 kHz ↔ 24 kHz resample inline, no `audioop` in Node), Media Streams JSON frame parser/builders, E.164 + DTMF validators, minimal Twilio REST client over fetch (no `twilio` npm SDK), HMAC-SHA1 signature verification. - `twilio.ts` — `TwilioAgentAdapter` extending `VoiceAgentAdapter`. Capabilities: `inputFormats: ["mulaw/8000"]`, `outputFormats: ["mulaw/8000"]`, `interruption: true` (clear-buffer event), `dtmf: true`. Implements `placeCall`, `waitForCall`, `sendAudio`, `receiveAudio`, `sendDtmf`, and `interrupt`. - `twilio-server.ts` — local HTTP + WS server (node `http` + `ws`) that impersonates Twilio's media-stream endpoint. Binds on an OS-assigned port (no hard-coded 8765). TwiML route returns `<Connect><Stream>` with the stream URL XML-escaped; signature gate fails closed. - `twilio-tunnel.ts` — wraps `@ngrok/ngrok` (preferred) with a `localtunnel` fallback. Both are dynamic-imported as optional peer deps so they don't bloat the runtime bundle. Scenarios bound in `specs/voice-agents.feature` via vitest-cucumber: - `@integration @ts-bound @ts-twilio-proto` x3 — capabilities, JSON protocol parser, clear-buffer interrupt (twilio.test.ts). - `@integration @ts-bound @ts-twilio-server` x2 — TwiML response shape + XML-escape, signature rejection (twilio-server.test.ts). - `@e2e @ts-bound @ts-twilio-tunnel` x1 — tunnel exposes local server. Env-gated on NGROK_AUTHTOKEN (twilio-tunnel.test.ts). Boy scout fixes in the same commit: - `tsconfig.json` — added `target: "ES2022"` so `tsc --noEmit` accepts top-level await + iterators. Without this, `pnpm typecheck` is broken on `main` post #517 (the @ts-bound retrofit shipped top-level await but didn't update the target). - `voice-contract-surface.test.ts` — narrowed `includeTags` from `["ts-bound"]` to `[["ts-bound", "ts-contract-surface"]]`. The retrofit's broad filter was destined to over-include any future `@ts-bound` scenario (PR-B/C/etc.); my Twilio scenarios surfaced the bug. Re-tagged the five contract-surface scenarios accordingly. - `package.json` — added `ws@^8.20.1` runtime dep + `@types/ws` devDep. Hazards documented in PR body: - PR10 (Pipecat g711) hadn't pushed at branch time, so PR11 owns `twilio-shared.ts`. When PR10 lands, the two files reconcile (same module name and surface area). - `@ngrok/ngrok` is a heavy native dep — kept optional and dynamic- imported so CI machines without NGROK_AUTHTOKEN don't pull it. - Tunnel test is env-gated; CI does not exercise it. Refs #372. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(twilio/#372): address /review concerns — logging, body cap, timing-safe compare, coverage Addresses 8 of the 13 actionable items from the /review fanout: Security: - twilio-server.ts: cap webhook body at 1 MB via streaming guard; reject with HTTP 413 instead of accumulating into memory (concern #7). - twilio-shared.ts: replace hand-rolled XOR signature compare with `crypto.timingSafeEqual` on decoded base64 buffers — Node-stdlib primitive, no DIY constant-time math (concern #10). - twilio-tunnel.ts: drop `(0, eval)("(name) => import(name)")` indirect; use bare dynamic `import()` in try/catch on ERR_MODULE_NOT_FOUND so bundlers and security scanners can analyze the path (concern #8). Coverage (the highest-risk port-only LOC was untested): - twilio.test.ts: codec round-trip — 100 ms 440 Hz sine wave through pcm16/24k → mulaw/8k → pcm16/24k, average abs sample-diff < 2000 (under 10 % of peak). Plus empty-input case. - twilio.test.ts: `verifyTwilioSignature` valid-signature accept, wrong-token reject, wrong-URL reject, missing-signature reject. - twilio.test.ts: `validateE164` + `validateDtmf` accept/reject + the TwiML-injection payload the docstring warns about. - twilio.test.ts: `onDtmf` callback fires on `dtmf` frame, `allowedCallers` filter rejects + records, stop-frame flush enqueues a final AudioChunk. Observability + boy-scout: - twilio-logger.ts (new): minimal `[twilio] …` console wrapper mirroring Python's `logging.getLogger("scenario.voice.twilio")`. Same log sites as the Python parity — body-cap violation, signature rejection, disallowed-caller reject, DTMF receipt, onDtmf callback error (concerns #1 + #14). - twilio-shared.ts: drop duplicate `PCM16_SAMPLE_WIDTH = 2`; import the canonical `PCM16_SAMPLE_WIDTH_BYTES` from `../audio-chunk` and rename call sites (concern #3). - twilio.ts: drop dead `UnsupportedCapabilityError` import + the `export type` re-export that papered over its unused state — base class re-exports via voice/index.ts already (concern #12). - twilio-tunnel.test.ts: wrap cucumber binding in `if (TUNNEL_ENABLED)`; on CI fall back to `describe.skip(...)` with a single placeholder `it` so the runner reports one skipped block instead of five vacuous greens (concern #5). Deferred (documented as follow-ups, not addressed here): - Refactor adapter↔server coupling into a `MediaStreamSession` value object (concern #2). Bigger architectural change; PR3+ executor wiring will exercise the seam first. - Migrate `makeDeferred` to `Promise.withResolvers()` (concern #9). - Replace `rejectedCount` instance field with `getStats()` snapshot (concern #11) — depends on the logger module's contract solidifying. - `call()` Liskov tension (concern #13) — same PR3+ wiring scope. Test surface: 33 passed + 1 skipped (was 27); full suite 409 passed + 1 skipped, build + typecheck green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(salvage): add CONSOLIDATION-MAP.md for voice/372-consolidation workbench * chore(voice/#372): unblock install — drop invalid-JSON SALVAGE comment, regen lockfile The keep-both consolidation merge left a `// SALVAGE-CONFLICT` comment inside package.json's dependencies block, making it invalid JSON. pnpm silently skipped dependency resolution (node_modules empty), blocking typecheck/test entirely. Both deps the marker straddled (`elevenlabs`, `fft.js`) were already present in the JSON — only the comment line was the conflict. Removed it (keep-both resolution preserved). Regenerated pnpm-lock.yaml from the now-valid manifest (the prior lock was the markers-stripped, "not semantically valid" artifact noted in CONSOLIDATION-MAP). Also adds docs/voice/REFACTOR-PROGRESS.md tracking the 11 EDR gaps + Tier A scope. Baseline after fix: `npx tsc --noEmit` = 5 errors, all in twilio-shared.ts (Gap #6 / Tier B). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(voice/#372): repair tsconfig.json duplicate "target" key (blocked vitest) The consolidated tree had `"target": "ES2022"` twice in compilerOptions. `tsc` tolerated it (warning only), but vitest's oxc transformer rejects duplicate JSON keys with a hard TSCONFIG_ERROR, blocking ALL test execution. Removed the dup. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(voice/#372): Gap #1 — split flat stt.ts into stt/ subtree, drop the global Per EDR §0.1/§5.3 and ADR-002: - New stt/ subtree, one file per provider: - stt-provider.ts: STTProvider interface + a "provider/model" router (resolveSttProvider / registerSttProvider / listSttProviders) - openai-stt.ts: OpenAISTTProvider (default gpt-4o-transcribe) - elevenlabs-stt.ts: ElevenLabsSTTProvider (scribe_v1) - wav.ts: shared pcm16ToWav upload encoder (de-dupes the two private copies) - index.ts: barrel + self-registration of the two providers - DELETED the module-global `let provider` + setSttProvider/getSttProvider — the process-wide mutable provider state that violated ADR-001. Provider state is now per-run on ScenarioConfig.voice (resolved in config.ts). - transcribe.ts: repointed off the global — `provider` option defaults to a per-run `new OpenAISTTProvider()` (pure default); explicit `null` = graceful degrade. - Tests: stt.test.ts rewritten as plain vitest unit tests for the providers + router (old @ts-stt binding matched nothing per EDR §7.4 and exercised removed APIs). transcribe.test.ts: "no provider" now expressed via provider:null. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(voice/#372): Gap #7 — per-run VoiceConfig + resolveVoiceConfig (keystone) New voice/config.ts (EDR §0.1 Tier 1 + ADR-002). The keystone of the per-run state model — replaces both the STT module-global (Gap #1) and configure({stt}) (Gap #2): - VoiceConfig { stt?: STTProvider | SttConfig; tts?: TtsConfig; defaultAudioFormat?; audioPlayback?; include{Audio,Timeline,Traces}? } - SttConfig { model; language?; apiKey? }, TtsConfig { voice; format?; apiKey? } - ResolvedVoiceConfig — stt always a concrete provider; the resolved per-run object - resolveVoiceConfig(optionLevel, scenarioLevel, defaults?): two-tier merge with the RunOptions.voice override in front of ScenarioConfig.voice, then pure defaults; `stt` resolves `options?.voice?.stt ?? cfg.voice?.stt ?? new OpenAISTTProvider()` (the default provider constructed per-run — pure default, not shared state). - DEFAULT_STT_MODEL, DEFAULT_AUDIO_FORMAT ("pcm16", the AI-SDK file part per §4.2). stt accepts an STTProvider instance (BYO) or an SttConfig descriptor (routed via resolveSttProvider). AudioFormat is a string union (nothing consumes a richer record yet; AudioChunk fixes 24kHz mono). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(voice/#372): Gap #2 — de-invent configure({stt}); keep configure() for global exec Per EDR §0.1 + ADR-002 + PRD §4.7: - config/configure.ts: removed the invented `configure({ stt })` provider knob (present in no other PR, not in Python). `configure()` now carries only global *execution* settings — `audioPlayback` (PRD §4.7: stream conversation audio to local speakers). Stored in a module record read by the runner; getGlobalSettings() exposes it. (audioPlayback is a genuine global UX toggle, not per-run provider state — the ADR-001 concern is provider/model state flowing into call(), which this is not.) - configure.test.ts: rewritten to test the audioPlayback surface + a @ts-expect-error asserting `stt` is no longer accepted. - index.ts: updated the stale `configure({ stt })` comment; configure export stays. Provider config is per-run via run({ voice: { stt, tts } }), not global. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(voice/#372): Gap #3 — unify the two audio-message producers (LIVE BUG) Two producers shipped incompatible in-message audio formats, both under the OpenAI `input_audio` convention (a shape the judge's transcript builder doesn't even read): messages.ts wrapped PCM16 in WAV tagged format:"wav"; adapter.runtime.ts emitted raw PCM16 tagged format:"pcm16". Their paired extractors decoded by tag, so cross-feeding mis-decoded a WAV header as audio samples (EDR §7.8). Standardized on the SINGLE canonical AI-SDK `file` part (EDR §4.2) — `{ type: "file", mediaType: "audio/pcm16", data: <base64> }` with the transcript as a preceding text part. This is what realtime/response-formatter.ts already emits and judge-utils.ts#buildTranscriptFromMessages already truncates. - messages.types.ts: retargeted to the file-part shape (AudioFilePart = FilePart & { mediaType: `audio/${string}` }, AudioMessage = ModelMessage, AudioMessageParts). - messages.ts: ONE encoder (createAudioMessage → raw-PCM16 file part) + ONE extractor (extractAudio — reads the canonical file part; still tolerates legacy input_audio/audio + WAV at the adapter edge). Added hasAudio / extractTranscript. - adapter.runtime.ts: deleted its private createAudioMessage + extractAudioFromLastMessage (+ the dup base64 helpers); now imports the shared messages.ts gateway. - judge-agent.ts: conversationHasAudio now recognizes the canonical file audio part (it only knew input_audio/audio — so it couldn't see the standardized format). - messages.test.ts: rewritten for the file-part shape with an offline encode→extract round-trip (payload + transcript preserved) and a cross-producer guard asserting the realtime-style file message and createAudioMessage output agree — the Gap #3 regression guard (EDR §8). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(voice/#372): resolve voice/index.ts SALVAGE markers for config/stt/messages Barrel cleanup (EDR §5.1) for the Tier A modules — removed the SALVAGE-CONFLICT markers and reconciled the exports: - Gap #4 (AgentSpeakingEvent): export once as the concrete class from ./adapter.runtime; the structurally-identical interface in ./adapter stays internal (the adapter's agentSpeakingEvent? field type). No external consumer imported it, so no breakage. - Gap #7: export the new per-run config surface (VoiceConfig/SttConfig/TtsConfig/ ResolvedVoiceConfig/resolveVoiceConfig/DEFAULT_*). - Gap #1: repoint STT exports to the ./stt subtree; drop setSttProvider/getSttProvider; add resolveSttProvider/registerSttProvider/listSttProviders. - Gap #3: messages re-exports updated (one createAudioMessage/extractAudio + new hasAudio/extractTranscript/AUDIO_PCM16_MEDIA_TYPE); messages.types re-exports retargeted to the file-part types. Left in place (Tier B): the twilio-shared (Gap #6) and composable Gap #5 markers — the barrel's adapter/tts exports still reference those unmerged modules. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(voice/#372): host wiring — ScenarioConfig.voice + per-run resolve in executor Tier A host wiring (EDR §0 host-side edits + ADR-002): - domain/scenarios/index.ts: ScenarioConfig gains `voice?: VoiceConfig` — the per-run carrier that reaches every call() via AgentInput.scenarioConfig (the only object that does; RunOptions does not). Module owns the type (config.ts), host owns the field. - runner/run.ts: RunOptions gains `voice?: VoiceConfig`; at the run() boundary the override is folded into cfg.voice field-by-field (`{ ...cfg.voice, ...options?.voice }`) so the carrier reaching call() reflects it. (Unlike langwatch, read once at the boundary — voice must ride ScenarioConfig because its consumers run inside call().) - voice-executor-state.ts: additive `voiceConfig?: ResolvedVoiceConfig | null` field (keeps the pr-538 interruption/backgroundNoise fields intact). - execution/scenario-execution.ts: the executor (which IS the VoiceExecutorState) gains a `voiceConfig` field, resolved via resolveVoiceConfig(undefined, cfg.voice) at run start when voice adapters are present — the resolved provider/knobs the judge STT pass + simulator TTS pass (Tier C) read, never a global. voice-models.ts (pr-536 EL/composable constants) and voice-executor-state.ts (pr-538 interruption fields) were already auto-merged intact — no reconciliation needed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(voice/#372): mark Tier A gaps done in REFACTOR-PROGRESS + record cascades Gaps #1/#2/#3/#7 + host wiring done; #4 verified intact. Final tsc/test state, remaining 29 SALVAGE markers, Tier B/C cascades (twilio-shared as critical-path blocker, composable de-dup now owed), and intentional EDR deviations recorded. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(voice/#372): Gap #6 — reconcile the two divergent twilio-shared.ts into one Resolve all 22 SALVAGE-CONFLICT markers in twilio-shared.ts: the keep-both merge of pr-540 (pipecat, codec-only) and pr-539 (twilio, codec+REST+validation) had physically interleaved the two function bodies, producing a parse error (TS1390 'if' as param name + TS1109 + TS1005) that masked full-program tsc and cascaded to 18 test files that transitively import the voice barrel. Single reconciled module: - ONE canonical codec (pr-540 semantics — required by twilio-shared-codec.test's same-rate identity `resamplePcm16(x,24000,24000) === x` and the round() output lengths). Canonical fn names mulaw8kToPcm16At24k / pcm16At24kToMulaw8k; the pr-539 names mulaw8kToPcm16_24k / pcm16_24kToMulaw8k kept as re-exported aliases so twilio.ts / twilio-server.ts keep their call sites unchanged. - KEEP pr-539's REST client (TwilioRESTHelper), validateE164/validateDtmf, redactE164/escapeXmlAttr, and verifyTwilioSignature (X-Twilio-Signature). - parseMediaStreamFrame returns the full MediaStreamEvent shape (event/streamSid/ callSid/payloadMulaw/dtmfDigit/markName) with the KNOWN_EVENTS guard; TWILIO_FRAME_BYTES / TWILIO_SAMPLE_RATE / TWILIO_FRAME_MS consts restored. Also resolves the two spec-side markers from the same pr-539/pr-540 keep-both: - specs/voice-agents.feature: drop the orphaned `@unit @ts-elevenlabs` tag that the merge stranded above the Twilio mulaw/8000 scenario (it was making elevenlabs.test bind a Twilio scenario → ScenarioNotCalledError). - voice-contract-surface.test.ts: adopt the AND-match filter includeTags:[["ts-bound","ts-contract-surface"]] so the contract-surface set no longer sweeps in every @ts-bound twilio scenario; drops the brittle excludeTags list. tsc: 5 twilio-shared parse errors → 0 (only the 3 pre-existing vitest Mock<> nits remain). Adapter cluster green: twilio, twilio-server, twilio-shared-codec, twilio-tunnel, pipecat, openai-realtime, gemini-live, elevenlabs, contract-surface. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(voice/#372): Gap #10 — split flat tts.ts into tts/ subtree + ElevenLabs TTS leaf Mirror the stt/ subtree (EDR §0 / §5.3): split the flat tts.ts into tts/{tts,openai-tts,elevenlabs-tts,index}.ts. - tts/tts.ts — the TtsProvider/TTSCallable/TtsEffectFn types, the PROVIDERS registry router, synthesize(), and the LRU cache. Cache invariant preserved verbatim: key = sha256(text)+voice; effects applied AFTER cache read so raw text never enters the payload (tts.test green, 4/4). - tts/openai-tts.ts — the OpenAI TTS leaf (openaiTts callable, gpt-4o-mini-tts, pcm response format). - tts/elevenlabs-tts.ts — NEW leaf (Gap #10): ElevenLabsTtsProvider + elevenLabsSynthesizeBytes (eleven_v3, output_format pcm_24000). Standalone bytes fn carries the apiKey + clientFactory test seam so the composable agent can de-dup onto it (Gap #5, next commit). Satisfies the PRD elevenlabs/rachel headline — voice="elevenlabs/<id>" now resolves through the TTS registry. - tts/index.ts — barrel + side-effect registration of both prefixes (mirrors stt/index.ts). Directory import keeps both `./tts` (barrel) and `../tts` (tts.test) resolving with zero path churn (moduleResolution: bundler). Dropped the tts SALVAGE-CONFLICT marker in voice/index.ts. tsc: unchanged (only the 3 pre-existing vitest Mock<> nits remain). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(voice/#372): Gap #5 — de-dup composable.ts onto canonical stt/tts; collapse EL files Gap #5: adapters/composable.ts no longer defines its own divergent copies. - DELETE the local STTProvider interface → import the canonical one from ../stt. - DELETE the local ElevenLabsSTTProvider → import from ../stt (re-exported from composable so the EL preset + tests keep their import sites). The canonical ../stt/elevenlabs-stt.ts leaf is switched to the SDK-based shape ({apiKey, clientFactory} + speechToText.convert) — the implementation that actually has transcribe() test coverage in elevenlabs.test; the prior fetch-based leaf had only an instanceof check. stt.test still green. - DELETE the inline synthesize() + the 4th pcm16ToWavBytes copy. composable's synthesize wrapper now routes the elevenlabs path through the tts/elevenlabs-tts leaf (Gap #10) honoring the apiKey + elevenLabsClientFactory test seam, and every other provider through the canonical ../tts registry. Task 5 (EL file collapse): fold ElevenLabsVoiceAgent (the local branded composable preset) into adapters/elevenlabs.ts next to the hosted ElevenLabsAgentAdapter, and delete adapters/eleven-labs-voice-agent.ts — one ElevenLabs file. NOTE: these are two distinct responsibilities (hosted ConvAI transport vs local composable preset), not one "ConvAI transport adapter" as the EDR §0.1 note assumed; collapsing into a single file (rather than merging the classes) preserves both behaviors + all 5 elevenlabs.test scenarios. Flagged for review. adapters/index.ts repointed: ElevenLabsVoiceAgent now from ./elevenlabs; STTProvider/ElevenLabsSTTProvider re-exported from composable (which sources them from ../stt). Dropped the Gap #5 SALVAGE-CONFLICT marker in voice/index.ts. tsc: only the 3 pre-existing vitest Mock<> nits remain. Green: elevenlabs (all 5 scenarios + 14 wire-protocol unit tests), composable, stt, transcribe. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(voice/#372): Gap #11 — settle call() across leaves on the runtime default The transport leaves shipped stub call() overrides ("PR3 will wire this") that threw or returned "" — pipecat/twilio/openai-realtime threw, gemini-live returned "". PR3's defaultVoiceCall is now the base VoiceAgentAdapter.call() (adapter.ts:67 → adapter.runtime.defaultVoiceCall). Remove the leaf overrides so pipecat, twilio, openai-realtime, gemini-live, and the hosted ElevenLabsAgentAdapter all inherit the one runtime default (send last user audio → drain agent response on tail-silence → record segments → return the canonical file audio message). The not-yet-connected path: defaultVoiceCall drives sendAudio/receiveAudio, which already raise each adapter's "not connected" error; pipecat additionally raises PendingTransportError at connect() for transport="webrtc". A uniform connected- state gate inside defaultVoiceCall is a larger executor change (no uniform accessor across leaves; no test requires it) — left for Tier C and noted. composable.ts keeps its own call() — it is the local BYO agent that runs the full STT→LLM→TTS loop itself, not a thin transport; its tests drive sendAudio/receiveAudio directly and never call() it. Removed now-dead AgentInput/AgentReturnTypes imports from gemini-live. Resolved the last two voice/index.ts SALVAGE-CONFLICT markers (effects barrel, pipecat) — zero markers remain in javascript/src + specs. tsc: only the 3 pre-existing vitest Mock<> nits remain. Green: gemini-live, openai-realtime, twilio, pipecat, elevenlabs, adapter-lifecycle (93 tests). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(voice/#372): clear the 3 pre-existing vitest Mock<> type nits → tsc clean Tier A documented 3 residual tsc errors (transcribe.test:70, tts.test:48, user-simulator-voice.test:70) as pre-existing vitest-4 Mock<> typing frictions, masked at the Tier A baseline by the twilio-shared parse error. They are the only non-twilio errors and block the Tier B gate ("tsc --noEmit clean"). Minimal, test-only casts (matching the file's existing `as unknown as` style): - transcribe.test: spy as unknown as STTProvider["transcribe"] at the inline call-site (the const-annotated mocks elsewhere in the file already typecheck). - tts.test: synthSpy as unknown as TTSCallable + import the TTSCallable type. - user-simulator-voice.test: the scenarioState stub object → `as unknown as` AgentInput["scenarioState"] (it doesn't structurally overlap the Like type). Runtime behavior unchanged (oxc strips types; all 24 tests in the three files still pass). `npx tsc --noEmit` now reports 0 errors. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(voice/#372): record Tier B done (Gaps #5/#6/#10/#11) + cascades to Tier C Mark Gaps #5/#6/#10/#11 done with commit SHAs; add the Tier B section (convergence gate evidence: tsc clean, full suite 44/1-skip, 0 SALVAGE markers), the EL-file- collapse review flag, the Gap #11 not-connected partial, and the Tier C cascade list. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(voice/#372): attach audio/timeline/latency to ScenarioResult (Gaps A+B) Tier C executor audio gaps: - Gap A: setResult() now attaches result.audio/timeline/latency for voice runs via buildVoiceResultFields(); latency finalized once at end-of-run (avg/p50/p95 via computeLatencyMetrics). Text-only runs leave the fields undefined (back-compat). - Gap B: adapter.runtime.ts emptyRecording() returns a VoiceRecordingRuntime instance (not a bare object) so result.audio.save()/saveSegments() exist. Verified offline (no real keys) by a new ScenarioExecution.execute() test with a voice FakeVoiceAdapter + audio user-sim + fake judge: result.audio instanceof VoiceRecordingRuntime, segments>0 (user+agent), timeline populated, latency.measurements>0, save() round-trips a WAV. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(voice/#372): add lowercase adapter factories (PRD §9 idiom) Adds thin new-XAgentAdapter() factory wrappers — pipecatAgent, openAIRealtimeAgent, geminiLiveAgent, elevenLabsAgent, twilioAgent, composableAgent — in voice/factories.ts. Exported from voice/index.ts and merged onto the top-level scenario object so the documented PRD §9 idiom scenario.pipecatAgent({...}) works. Class forms stay public (EDR §0 barrel lists both). voice namespace also exposes the factories. Verified: factories.test.ts — each factory returns the right adapter class (instanceof), reachable via both scenario.* and the voice namespace. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(voice/#372): net-new judge STT pre-pass (judge-stt.ts) EDR §3.3 / §7.7 — automatic transcription of audio file-parts to text BEFORE buildTranscriptFromMessages, using the per-run resolved STT provider (cfg.voice.stt). The judge reads spoken words, not a [AUDIO: …] byte-marker. No 'judge requests transcript' tool (§7.3) — STT is upstream + automatic. - voice/judge-stt.ts: prepareJudgeInput({messages, stt, options}) — transcribes audio parts to text; keeps audio for multimodal models iff includeAudio, strips it otherwise; reuses an existing transcript text part (no STT call); STT failures degrade gracefully (drop audio, warn, continue). - JudgeAgent.call(): transcribeAudioForJudge() resolves stt off input.scenarioConfig.voice and runs the pre-pass when the conversation has audio (text-only fast path otherwise — no provider constructed). Exported from the voice barrel. Verified: judge-stt.test.ts (6) — unit cases + JudgeAgent.call() integration with stubbed STT+LLM shows the transcript view carries text, no base64 leak. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(voice/#372): wire user-simulator per-run TTS (Task 5) EDR §3.2 — the simulator's default _synthesize now routes through the per-run voice/tts registry (synthesize()), not the old throwing PR2 stub. Effects still apply AFTER the (text,voice) cache read (voiceify, unchanged invariant). - _synthesize default → voice/tts#synthesize (per-run router + …
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.