chore: main-side cleanup — docs + spec + python/TS parity#586
Conversation
…ed voice/adapters/twilio.mdx) The orphan was a hands-on smoke-test playbook (Twilio trial setup, cloudflared install, 3 runnable smoke scripts, manual webhook reset recipe). It is superseded by the published reference doc at docs/docs/pages/voice/adapters/twilio.mdx, which is the canonical user-facing source. The orphan referenced example files that no longer exist in the current tree (voice_pipecat_scenario.py, voice_twilio_agent_answers_scenario.py, voice_twilio_simulator_calls_human_scenario.py), so keeping it would have actively misled users to dead-link bait. Verified zero external refs: rg -l voice-twilio --hidden # empty Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…process doc convention) Process docs idiomatically live in .github/ alongside CODEOWNERS, PULL_REQUEST_TEMPLATE, and contributing guides. The Low-Risk PR policy is GitHub-workflow scaffolding, not user-facing product docs. Coordinated updates to every load-bearing reference: - .github/workflows/pr-auto-approve.yml — RESTRICTED_PATTERN regex, 3 file-path uses, 2 GitHub URLs, 1 comment. - specs/langwatch-pr-gate-pattern.feature — 3 spec assertions and the AC-X4 coverage-map comment. - scripts/validate-pr-auto-approve.sh — EXPECTED_PATTERN regex (must match the workflow byte-for-byte), AC-X3 policy-path grep assertions and messages, comment-URL assertion, plus a new fail-guard against the prior docs/ path resurfacing. Verified the validator runs clean on the new state (AC-X3 + AC-1.6 all pass). Two pre-existing AC-1.11 failures on main are unrelated to this move (legacy workflows already deleted by a later PR in the gate-swap sequence). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Audit feedback: these were real docs with real refs but were unpublished — discoverable only by grep. Moved into the vocs pages tree + added to the sidebar so users actually find them. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…f-truth Audit feedback: three files claimed "capability matrix" with slightly different content (Python source-of-truth, TS mirror, published mdx). Reduced to one: the published mdx is now the canonical source. Both sides reference the public URL. Error messages in Python and TS code now point users at the live docs URL instead of an unpublished markdown file. JS contract-surface test points at the published mdx. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…rkflows Workflows approval-or-hotfix.yml and low-risk-evaluation.yml were deleted in PR #4 of the gate-swap sequence (already merged on main) but the spec still asserted the diff of PR #1 must not modify them. Both PRs have long-since landed and the files no longer exist on main, so the assertion is no longer meaningful. The AC mapping comment is preserved with a tombstone explaining the removal. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…y with TS) Mirrors the TS rename in PR #561. "Outputs" describes the purpose (what the example tests produced); "recordings" describes the format (audio recordings). The recording artifact name is preserved inside the helper (save_demo_recording, _recording_helper.py, _RECORDINGS_ROOT variable) — only the on-disk dir changes. Updates: - git mv python/recordings → python/outputs (68 files; rename detected) - python/examples/voice/_recording_helper.py: directory path + docstrings - python/examples/voice/pipecat_{scenario,ws}.py: comment paths - python/.env.example: SCENARIO_LOG_FILE default - .gitignore: all python/recordings patterns - .github/workflows/voice-integration.yml: upload-artifact path - specs/voice-agents.feature: 3 mentions in scenario text - specs/voice-docs-surface.feature: 1 mention in troubleshooting scenario Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…red ambience) Python was shipping 0.5s/24KB single-tone placeholders; TS upgraded to 3s/144KB layered cafe/office/highway/wind ambience in PR #561. Bringing Python into parity so backgroundNoise() ships the same quality on both sides. Byte-identical to the TS asset bundle (same deterministic seeded generator). WAVs: 144044 bytes each, 3.00s, mono PCM16 24kHz. LICENSES.md: updated to describe the new generator + layered content; notes the cross-language byte-identical copy with a reference to the single-canonical-generator follow-up at #588. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
User feedback: outputs/ should be a parent for all test-run artifact types (recordings now, traces/logs/screenshots later). Adds the recordings/ nesting + a new outputs/README.md describing the shape. Writer in _recording_helper.py updated to point at outputs/recordings/. Symmetric with the TS-side nesting in PR #561. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
JSON allows duplicate keys (last wins) but Node 24's tsx/oxc parser rejects with a strict JSON parse error. This was breaking 21 test files on the ci-checks (24.x) CI matrix on both main and PR #586. Removes the duplicate; first occurrence (same value, ES2022) is kept. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pre-existing failure masked by the tsconfig parse error; surfaced after PR #586's tsconfig fix unblocked the test loader. Commit 71dd5ed (origin/main, 2026-05-29) added 'interruption' to the feature step: And it declares: streaming_transcripts, native_vad, dtmf, interruption, input_formats, output_formats but did not update the matching binding string in voice-contract-surface.test.ts, so vitest-cucumber raised StepAbleUnknowStepError for 'Scenario: Every adapter publishes a capabilities attribute' at specs/voice-agents.feature:751. The binding body already asserted empty.interruption — only the title string was stale. Aligning it restores the green path: Test Files 1 passed (1) Tests 16 passed (16) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
gemini-3-pro-preview returned 404 on the live Gemini API: "This model models/gemini-3-pro-preview is no longer available. Please update your code to use a newer model for the latest features and improvements." Swapped both occurrences (JudgeAgent + litellm.completion) for gemini-2.5-pro — current stable pro-tier Gemini, same intent as the original (pro-tier judge). The other Gemini reference in this file (UserSimulatorAgent on gemini-2.5-flash) and other repo Gemini call sites already use the gemini-2.5 family. Pre-existing failure on main; this unblocks main CI and PR #586. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The `gh pr diff` API returns 406 when a PR's diff exceeds GitHub's 20k-line cap (observed on PR #561's evaluate run 26644602950). The downstream "Fail fast for oversized diffs" step already handles `oversized=true` gracefully (sets qualifies=false, exits 0, posts manual-review-required review), but the upstream `gh pr diff` call crashes before reaching it. Wrap the fetch: on HTTP 406, set oversized=true and exit 0. Other failures still propagate as exit 1. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Follow-up to bafdbf7: the 406-handling fix now emits an oversized_reason output so the downstream "Fail fast for oversized diffs" step can post a more honest review explaining WHY evaluation was skipped (fetch failure vs char-count exceeded). Pass the reason via env: pattern (intermediate environment variable, not direct ${{ }} expansion in run:) per the documented GitHub Actions shell-injection hardening pattern. Use heredoc for GITHUB_OUTPUT writes since gh CLI stderr can be multi-line and would otherwise corrupt the output-file format. Pattern source: PR #572 commit 7dd9311. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/prove-it: PASS + /review: clean (re-verified at All ACs across #571 / #587 / #589 / #590 + PR-body claims map to first-hand evidence:
Review findings addressed (commit
Non-fixes (deliberate): spec 🤖 /prove-it + /review via Claude Code |
The AC-1.11 block asserted approval-or-hotfix.yml and low-risk-evaluation.yml still exist, but both were deleted in PR #4 of the gate-swap sequence, so the validator exited 1 even though the workflow is correct. The matching spec scenario was already removed in this PR; this aligns the validator script. Also fix the noise-sample LICENSES.md regeneration note, which pointed at generate-noise-samples.mjs — a script that ships with PR #561 and is not on main yet. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
📣 Slack PR-review request posted to #dev — thread. (CI 🟢 on |
|
Automated low-risk assessment This PR was evaluated against the repository's Low-Risk Pull Requests procedure and does not qualify as low risk.
This PR requires a manual review before merging. |
gemini-3-pro-preview returned 404 on the live Gemini API: "This model models/gemini-3-pro-preview is no longer available. Please update your code to use a newer model for the latest features and improvements." Swapped both occurrences (JudgeAgent + litellm.completion) for gemini-2.5-pro — current stable pro-tier Gemini, same intent as the original (pro-tier judge). The other Gemini reference in this file (UserSimulatorAgent on gemini-2.5-flash) and other repo Gemini call sites already use the gemini-2.5 family. Pre-existing failure on main; this unblocks main CI and PR #586. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
Consolidated
main-side cleanup PR: docs housekeeping + spec housekeeping + python/TS parity fixes. Surfaced by a directory-structure audit on PR #561. Lands the changes for the follow-up issues listed under Closes below (most already closed; see that section).File-structure changes
What moved, what was deleted, what was added (the recordings tree moved wholesale — 68 files — shown collapsed):
Changes
Docs housekeeping
docs/voice-bug-bash.md(zero refs, stale to closed PR #355) anddocs/voice-twilio.md(superseded by the publishedvoice/adapters/twilio.mdx).docs/voice/happy-path-{elevenlabs,openai-realtime}.md→docs/docs/pages/voice/*.mdx, with sidebar entries indocs/vocs.config.tsxso they render on the docs site.docs/voice/capability-matrix.mdandjavascript/docs/voice/capability-matrix.md; merged into the existingdocs/docs/pages/voice/capability-matrix.mdx. The error messages inpython/scenario/voice/capabilities.pyandjavascript/src/voice/capabilities.tsnow point at the published URL, and the JS contract-surface test (javascript/src/voice/__tests__/voice-contract-surface.test.ts) points at the published mdx.Process / infra
docs/LOW_RISK_PULL_REQUESTS.md→.github/— GitHub-process docs idiomatically live alongsideCODEOWNERSandPULL_REQUEST_TEMPLATE. The path is load-bearing in.github/workflows/pr-auto-approve.yml; the workflow regex, validator script, and spec assertions were updated in the same change. Verified viascripts/validate-pr-auto-approve.sh.pr-auto-approve.ymldiff fetch + harden reason interpolation —gh pr diffhits GitHub's 20k-line API cap on huge PRs (observed on #561, run 26644602950) and crashes the evaluate step before reaching the workflow's own oversized-diff guard. Two parts: (1) wrap the fetch with a grep-specific catch on the 20k-line error string (auth/network failures stillexit 1); (2) emitoversized_reasonvia heredoc and pass it through theenv:pattern to the fail-fast step (the documented GitHub Actions anti-shell-injection pattern, even when the source is gh stderr). Supersedes PR #572. Closes ci: evaluate workflow hard-fails on PRs >20k-line diff instead of its oversized path #571."target"key injavascript/tsconfig.json— a pre-existing defect that broke the JS test suite under Node 24's strict JSON parser (it was already failing onmain). This change unblocks that suite.Spec housekeeping
PR #1 does not modify the legacy approval workflowsscenario inspecs/langwatch-pr-gate-pattern.feature. It asserted PR Add Mintlify documentation #1 leftapproval-or-hotfix.ymlandlow-risk-evaluation.ymluntouched, but both workflows have since been deleted onmain, so the assertion is no longer meaningful. The matching traceability comment is updated to mark it removed. (The separate deletion scenarios for those two workflows in the same spec are unaffected.)Python / TS parity
python/recordings/→python/outputs/recordings/— mirrors the TS rename + nest in PR #561.outputs/is the parent for all test-run artifacts (recordings now, traces/logs/screenshots later): "outputs" describes the purpose, "recordings" the format. The helper module and function names are preserved (_recording_helper.py,save_demo_recording); a newpython/outputs/README.mddocuments the parent dir.python/scenario/voice/assets/noise/*.wavto match the layered, distinctcafe/street/office/airportambience (plus thebabblesample used by themultiple_voiceseffect) now shipping on TS in PR #561. Closes the parity gap inbackgroundNoise(). Generated with the same deterministic seeded generator as the TS assets (which ship on feat(typescript-sdk): voice agent testing — consolidated clean stack #561, not yet onmain), so the two are reproducible from one seed; a single canonical generator that writes both targets is tracked as follow-up voice: fold to one canonical noise-sample generator (single source-of-truth) #588.Test sanity
CI=true(they require API keys)."capability matrix" in str(err)assertion green; the recordings rename is path-only, so the recording artifact contract is unchanged.Out of scope (kept as one follow-up issue)
python/scenario/voice/assets/noise/*.wavANDjavascript/src/voice/assets/noise/*.wav). Architectural change; deferred.Closes
Closes #571.
Also carries to
mainthe work tracked by #587, #589, and #590 — those issues were already closed on 2026-05-29; this PR lands their changes. (GitHub auto-closes only the still-open #571 on merge.)🤖 Generated with Claude Code