fix(voice/#350): post-merge cleanup — capability AC, disconnect logging, doc drift, e2e wedge#492
Conversation
requires-python is >=3.10, so PyPI 3.8/3.9 classifier entries were false-positives in package search. The doc-drift half of #489 (prove-it report scenario counts) is moot — docs/proposals/issue-350-prove-it-report.md was already deleted in f423d67 (May 11) as a relocated process artifact. Closes #489. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The docstring at scenario_executor.py:750-751 promises "failures are logged but do not mask the primary scenario result", but the implementation silently swallowed every Exception. That made real operational risks invisible — most notably the Twilio B-leg voice_url restore path, where a failure leaves a customer account clobbered with no operator signal. Adds logger.warning(..., exc_info=True) at both swallow sites (adapter.disconnect() and the ffmpeg playback .stop()), keying the adapter message on type(agent).__name__. Exceptions are still absorbed so cleanup completes — only their visibility changes. Closes #488. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…dapter tests The interruption capability is published by adapters (Pipecat, Twilio, OpenAI Realtime, Gemini Live all set interruption=True) and gated by the base VoiceAgentAdapter.interrupt() default, which raises UnsupportedCapabilityError. Until now neither the feature file's "Every adapter publishes a capabilities attribute" scenario nor the parametrized adapter assertions named the field, so a TS-port (#372) re-implementation could silently drop it. - specs/voice-agents.feature:754 — name `interruption` in the declared-capability list. - python/tests/voice/test_adapters.py:202-211 — assert every adapter declares interruption as a bool, alongside the other flags. Catches any future capability removed by accident. - python/tests/voice/test_capabilities.py — extend the conservative- defaults assertion to cover interruption; add a negative-case test that BareVoiceAdapter (interruption=False) raises UnsupportedCapabilityError when ``await adapter.interrupt()`` is called against it, naming both adapter and capability. - docs/voice/capability-matrix.md — fix Gemini Live cell from ❌ to ✅ (the adapter declares interruption=True via Activity markers; the matrix said ❌, drift caught while scoping this fix). Update the Deferred section to drop Gemini Live from the ElevenLabs-only "no native cancel" note. Closes #487. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…on requirement VAD half (full delivery): - Parametrize WebRTCVadFallback warning rate-limit assertions to lock in the three documented cases: ["A","A"] (1), ["A","B"] (2), cross-instance same-string (1). The dedupe is on the caller-passed adapter_name string, memoized in the ClassVar set across all instances — the parametrized form makes that property obvious and cheaper to extend. - Keep one scalar test for the warning *content* (names adapter, mentions native VAD + accuracy caveat) — that's a different assertion shape than rate-limit count. E2E wedge half (partial — diagnosis only, no fix): - TESTING.md now documents the isolation requirement: the multi-turn voice @e2e demos wedge when run in a single pytest process. They're already deselected from default CI (conftest auto-marks *_e2e.py as integration; python-ci uses -m "not integration") and additionally carry @pytest.mark.skip. Documents the per-cluster invocation pattern for nightly voice-integration runs. - Wedge root-cause not isolated in this PR. Likely contributors (background tasks not reaped between function-scoped loops, ffmpeg subprocess lingering, max_turns runaway) called out in the new TESTING.md section. Refs #491 — VAD AC done, isolation-requirement AC done. Wedge root-cause AC stays open in #491 with diagnosis attached as a comment. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ss (CodeQL nit) CodeQL flagged five locations where the empty-body convention `async def foo(self) -> None: ...` was used as a stub on test-only inline VoiceAgentAdapter subclasses (BareVoiceAdapter, ExplodingVoiceAdapter). Replace each with an explicit `pass` to silence the bot. No behavior change — the stubs are still no-ops, and both touched tests stay green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Automated low-risk assessment This PR was evaluated against the repository's Low-Risk Pull Requests procedure.
An approving review has been submitted by automation. The PR may merge once required CI checks pass. |
|
[grinder] READY for human review CI: green (14 non-skipping checks passing, 0 failing, 0 pending) Verification Report
Verdict: 5/5 PASS Verified by: |
Feature file `specs/voice-agents.feature:971` (added by main commit 71dd5ed / PR #492) lists `interruption` in the adapter-capabilities declaration. The vitest-cucumber binding at voice-contract-surface.test.ts:177 still had the pre-71dd5ed step title (missing `interruption`), so StepAble couldn't find the matching feature step. Update the step title to match the feature file and add the live-adapter `typeof caps.interruption === "boolean"` check (the empty-adapter check on line 192 already exists). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Feature file `specs/voice-agents.feature:971` (added by main commit 71dd5ed / PR #492) lists `interruption` in the adapter-capabilities declaration. The vitest-cucumber binding at voice-contract-surface.test.ts:177 still had the pre-71dd5ed step title (missing `interruption`), so StepAble couldn't find the matching feature step. Update the step title to match the feature file and add the live-adapter `typeof caps.interruption === "boolean"` check (the empty-adapter check on line 192 already exists). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Feature file `specs/voice-agents.feature:971` (added by main commit 71dd5ed / PR #492) lists `interruption` in the adapter-capabilities declaration. The vitest-cucumber binding at voice-contract-surface.test.ts:177 still had the pre-71dd5ed step title (missing `interruption`), so StepAble couldn't find the matching feature step. Update the step title to match the feature file and add the live-adapter `typeof caps.interruption === "boolean"` check (the empty-adapter check on line 192 already exists). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
Bundles four post-merge voice-#350 cleanup issues into one PR by request. Source: fresh-eyes audit of PR #355.
Four atomic commits — one per issue — so any one can be reverted or cherry-picked without losing the others.
Closes #487
Closes #488
Closes #489
#491is partial: VAD parametrized test + isolation-requirement docs land here; wedge root-cause diagnosis stays open as a follow-up with a comment attached. Refs #491.#490(15 MB recordings) is intentionally not in this PR — needs a maintainer policy choice between LFS / golden-per-cluster / manifests-only / CDN.#489 — doc drift on @Unit scenario count + stale Python classifiers
The doc-drift half is moot:
docs/proposals/issue-350-prove-it-report.mdwas already deleted inf423d67(May 11) as a relocated process artifact. The remaining work is the stale classifier fix.python/pyproject.toml:17-21— dropProgramming Language :: Python :: 3.8and3.9entries.requires-pythonis>=3.10, so those were PyPI search false-positives.Evidence:
git diff c833990^..c833990 -- python/pyproject.tomlshows two lines removed.#488 — log disconnect failures in
_voice_disconnect_allThe docstring at
python/scenario/scenario_executor.py:750-751promised "failures are logged but do not mask the primary scenario result", but the implementation silently swallowed everyException. That made real operational risks invisible — most notably the Twilio B-legvoice_urlrestore path, where a failure leaves a customer account clobbered with no operator signal.python/scenario/scenario_executor.py:758-768— addlogger.warning(..., exc_info=True)in both swallow sites (adapterdisconnect()andffmpeg_playback.stop()), keying the adapter message ontype(agent).__name__. Exceptions are still absorbed so cleanup completes — only their visibility changes.python/tests/test_scenario_executor.py::test_voice_disconnect_logs_adapter_failures— fakeExplodingVoiceAdapterwhosedisconnect()raises; asserts a WARNING record with adapter name +exc_infois captured.Evidence:
#487 —
interruptionin published-capabilities AC + parametrized adapter testsThe
interruptionfield exists inAdapterCapabilitiesand is set by Pipecat, Twilio, OpenAI Realtime, and Gemini Live adapters, but was not named in the feature file's published-capabilities scenario nor asserted by the parametrized adapter test. TS port (#372) would have inherited the gap.specs/voice-agents.feature:754— addinterruptionto the declared-capability list.python/tests/voice/test_adapters.py:202-211— extendtest_every_adapter_publishes_capabilitiesto assert each capability flag (streaming_transcripts,native_vad,dtmf,interruption) is a bool, alongside the existing format-list checks.python/tests/voice/test_capabilities.py— coverinterruption=Falsedefault + newtest_interrupt_raises_unsupported_when_interruption_false(aBareVoiceAdapterwith default capabilities raisesUnsupportedCapabilityErrorfromawait adapter.interrupt(), naming both adapter and capability).docs/voice/capability-matrix.md— fix Gemini Live cell from ❌ to ✅ (the adapter declaresinterruption=Truevia Activity markers; the matrix said ❌ — drift caught while scoping this fix). Update the Deferred section to drop Gemini Live from the ElevenLabs-only "no native cancel" note.Evidence:
#491 — partial — VAD rate-limit tests + isolation-requirement docs
Full delivery on the VAD half:
python/tests/voice/test_vad.py— replace the two scalar warning-count tests with a single parametrizedtest_vad_fallback_rate_limits_by_adapter_namecovering the three documented cases:["A","A"]→ 1 warning["A","B"]→ 2 warningsadapter_namestring in a ClassVar set, across all instances)Partial delivery on the wedge half:
TESTING.md— new section documenting the @e2e suite isolation requirement: the multi-turn voice demos wedge when the full suite is run in a single pytest process. Notes that conftest auto-marks*_e2e.pyas integration (deselected from default CI), the@pytest.mark.skipmarkers are belt-and-suspenders for the nightlyvoice-integration.ymldispatch, and the per-cluster invocation pattern that avoids the wedge.couple hours, ship VAD + docs if stubborn) hit; LLM credits preserved for demo verification.Evidence:
Test plan
#489— classifier diff inspected; no runtime tests needed.#488—pytest tests/test_scenario_executor.py -k voice_disconnect -vpasses.#487—pytest tests/voice/test_capabilities.py tests/voice/test_adapters.py tests/voice/test_feature_file_contract.py -vpasses (64 tests).#491(partial) —pytest tests/voice/test_vad.py -vpasses (6 tests). Wedge diagnosis posted on the issue.🤖 Generated with Claude Code