Skip to content

chore: main-side cleanup — docs + spec + python/TS parity#586

Merged
drewdrewthis merged 15 commits into
mainfrom
chore/docs-cleanup-orphans
Jun 1, 2026
Merged

chore: main-side cleanup — docs + spec + python/TS parity#586
drewdrewthis merged 15 commits into
mainfrom
chore/docs-cleanup-orphans

Conversation

@drewdrewthis
Copy link
Copy Markdown
Collaborator

@drewdrewthis drewdrewthis commented May 29, 2026

Summary

Consolidated main-side cleanup PR: docs housekeeping + spec housekeeping + python/TS parity fixes. Surfaced by a directory-structure audit on PR #561. Lands the changes for the follow-up issues listed under Closes below (most already closed; see that section).

File-structure changes

What moved, what was deleted, what was added (the recordings tree moved wholesale — 68 files — shown collapsed):

docs/
  voice-bug-bash.md                          ✖ deleted (orphan QA guide, zero refs)
  voice-twilio.md                            ✖ deleted (superseded by voice/adapters/twilio.mdx)
  voice/
    capability-matrix.md                     ✖ deleted ─┐ folded into the published mdx
    happy-path-elevenlabs.md                 ──► docs/docs/pages/voice/happy-path-elevenlabs.mdx   (published)
    happy-path-openai-realtime.md            ──► docs/docs/pages/voice/happy-path-openai-realtime.mdx (published)
  LOW_RISK_PULL_REQUESTS.md                  ──► .github/LOW_RISK_PULL_REQUESTS.md   (GitHub-process doc)
  docs/pages/voice/capability-matrix.mdx     ✎ edited (now the single source-of-truth; sidebar in docs/vocs.config.tsx)

javascript/
  docs/voice/capability-matrix.md            ✖ deleted ─┘ (was a divergent copy)

python/
  recordings/                                ──► python/outputs/recordings/   (whole tree: 18 demos, 68 files)
  outputs/README.md                          ✚ added (documents the new outputs/ parent)
  scenario/voice/assets/noise/*.wav          ✎ refreshed (cafe/street/office/airport + babble)
  scenario/voice/capabilities.py             ✎ edited (error msg → published capability-matrix URL)

javascript/src/voice/capabilities.ts         ✎ edited (error msg → published capability-matrix URL)
.github/workflows/pr-auto-approve.yml        ✎ edited (HTTP-406 oversized-diff handling; new LOW_RISK path)
javascript/tsconfig.json                     ✎ edited (removed duplicate "target" key)
specs/langwatch-pr-gate-pattern.feature      ✎ edited (dropped a stale, no-longer-meaningful scenario)

Changes

Docs housekeeping

  • Delete two orphan docsdocs/voice-bug-bash.md (zero refs, stale to closed PR #355) and docs/voice-twilio.md (superseded by the published voice/adapters/twilio.mdx).
  • Publish the two happy-path guidesdocs/voice/happy-path-{elevenlabs,openai-realtime}.mddocs/docs/pages/voice/*.mdx, with sidebar entries in docs/vocs.config.tsx so they render on the docs site.
  • Fold 3 capability-matrix files into 1 published source-of-truth — deleted docs/voice/capability-matrix.md and javascript/docs/voice/capability-matrix.md; merged into the existing docs/docs/pages/voice/capability-matrix.mdx. The error messages in python/scenario/voice/capabilities.py and javascript/src/voice/capabilities.ts now point at the published URL, and the JS contract-surface test (javascript/src/voice/__tests__/voice-contract-surface.test.ts) points at the published mdx.

Process / infra

  • Move docs/LOW_RISK_PULL_REQUESTS.md.github/ — GitHub-process docs idiomatically live alongside CODEOWNERS and PULL_REQUEST_TEMPLATE. The path is load-bearing in .github/workflows/pr-auto-approve.yml; the workflow regex, validator script, and spec assertions were updated in the same change. Verified via scripts/validate-pr-auto-approve.sh.
  • Catch HTTP 406 in pr-auto-approve.yml diff fetch + harden reason interpolationgh pr diff hits GitHub's 20k-line API cap on huge PRs (observed on #561, run 26644602950) and crashes the evaluate step before reaching the workflow's own oversized-diff guard. Two parts: (1) wrap the fetch with a grep-specific catch on the 20k-line error string (auth/network failures still exit 1); (2) emit oversized_reason via heredoc and pass it through the env: pattern to the fail-fast step (the documented GitHub Actions anti-shell-injection pattern, even when the source is gh stderr). Supersedes PR #572. Closes ci: evaluate workflow hard-fails on PRs >20k-line diff instead of its oversized path #571.
  • Fix a duplicate "target" key in javascript/tsconfig.json — a pre-existing defect that broke the JS test suite under Node 24's strict JSON parser (it was already failing on main). This change unblocks that suite.

Spec housekeeping

  • Remove the stale PR #1 does not modify the legacy approval workflows scenario in specs/langwatch-pr-gate-pattern.feature. It asserted PR Add Mintlify documentation #1 left approval-or-hotfix.yml and low-risk-evaluation.yml untouched, but both workflows have since been deleted on main, so the assertion is no longer meaningful. The matching traceability comment is updated to mark it removed. (The separate deletion scenarios for those two workflows in the same spec are unaffected.)

Python / TS parity

  • Rename python/recordings/python/outputs/recordings/ — mirrors the TS rename + nest in PR #561. outputs/ is the parent for all test-run artifacts (recordings now, traces/logs/screenshots later): "outputs" describes the purpose, "recordings" the format. The helper module and function names are preserved (_recording_helper.py, save_demo_recording); a new python/outputs/README.md documents the parent dir.
  • Refresh python/scenario/voice/assets/noise/*.wav to match the layered, distinct cafe / street / office / airport ambience (plus the babble sample used by the multiple_voices effect) now shipping on TS in PR #561. Closes the parity gap in backgroundNoise(). Generated with the same deterministic seeded generator as the TS assets (which ship on feat(typescript-sdk): voice agent testing — consolidated clean stack #561, not yet on main), so the two are reproducible from one seed; a single canonical generator that writes both targets is tracked as follow-up voice: fold to one canonical noise-sample generator (single source-of-truth) #588.

Test sanity

  • Python voice unit + capability + recording suites pass with 0 regressions (see CI). Live-LLM / integration tests are deselected under CI=true (they require API keys).
  • The capability error-message refactor keeps the existing "capability matrix" in str(err) assertion green; the recordings rename is path-only, so the recording artifact contract is unchanged.

Out of scope (kept as one follow-up issue)

Closes

Closes #571.

Also carries to main the work tracked by #587, #589, and #590 — those issues were already closed on 2026-05-29; this PR lands their changes. (GitHub auto-closes only the still-open #571 on merge.)

🤖 Generated with Claude Code

drewdrewthis and others added 3 commits May 29, 2026 13:58
…es closed PR #355)

Internal QA guide superseded by PR #355's merge. Zero references in
the repo. The file referenced workflow that no longer matches the
current voice-adapter architecture.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ed voice/adapters/twilio.mdx)

The orphan was a hands-on smoke-test playbook (Twilio trial setup,
cloudflared install, 3 runnable smoke scripts, manual webhook reset
recipe). It is superseded by the published reference doc at
docs/docs/pages/voice/adapters/twilio.mdx, which is the canonical
user-facing source.

The orphan referenced example files that no longer exist in the
current tree (voice_pipecat_scenario.py,
voice_twilio_agent_answers_scenario.py,
voice_twilio_simulator_calls_human_scenario.py), so keeping it would
have actively misled users to dead-link bait.

Verified zero external refs:
  rg -l voice-twilio --hidden  # empty

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…process doc convention)

Process docs idiomatically live in .github/ alongside CODEOWNERS,
PULL_REQUEST_TEMPLATE, and contributing guides. The Low-Risk PR
policy is GitHub-workflow scaffolding, not user-facing product docs.

Coordinated updates to every load-bearing reference:
- .github/workflows/pr-auto-approve.yml — RESTRICTED_PATTERN regex,
  3 file-path uses, 2 GitHub URLs, 1 comment.
- specs/langwatch-pr-gate-pattern.feature — 3 spec assertions and
  the AC-X4 coverage-map comment.
- scripts/validate-pr-auto-approve.sh — EXPECTED_PATTERN regex
  (must match the workflow byte-for-byte), AC-X3 policy-path
  grep assertions and messages, comment-URL assertion, plus a
  new fail-guard against the prior docs/ path resurfacing.

Verified the validator runs clean on the new state (AC-X3 + AC-1.6
all pass). Two pre-existing AC-1.11 failures on main are unrelated
to this move (legacy workflows already deleted by a later PR in the
gate-swap sequence).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@drewdrewthis drewdrewthis self-assigned this May 29, 2026
drewdrewthis and others added 5 commits May 29, 2026 14:25
Audit feedback: these were real docs with real refs but were
unpublished — discoverable only by grep. Moved into the vocs
pages tree + added to the sidebar so users actually find them.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…f-truth

Audit feedback: three files claimed "capability matrix" with slightly
different content (Python source-of-truth, TS mirror, published mdx).
Reduced to one: the published mdx is now the canonical source. Both
sides reference the public URL. Error messages in Python and TS code
now point users at the live docs URL instead of an unpublished
markdown file. JS contract-surface test points at the published mdx.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…rkflows

Workflows approval-or-hotfix.yml and low-risk-evaluation.yml were
deleted in PR #4 of the gate-swap sequence (already merged on main)
but the spec still asserted the diff of PR #1 must not modify them.
Both PRs have long-since landed and the files no longer exist on main,
so the assertion is no longer meaningful. The AC mapping comment
is preserved with a tombstone explaining the removal.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…y with TS)

Mirrors the TS rename in PR #561. "Outputs" describes the purpose
(what the example tests produced); "recordings" describes the format
(audio recordings). The recording artifact name is preserved inside
the helper (save_demo_recording, _recording_helper.py, _RECORDINGS_ROOT
variable) — only the on-disk dir changes.

Updates:
- git mv python/recordings → python/outputs (68 files; rename detected)
- python/examples/voice/_recording_helper.py: directory path + docstrings
- python/examples/voice/pipecat_{scenario,ws}.py: comment paths
- python/.env.example: SCENARIO_LOG_FILE default
- .gitignore: all python/recordings patterns
- .github/workflows/voice-integration.yml: upload-artifact path
- specs/voice-agents.feature: 3 mentions in scenario text
- specs/voice-docs-surface.feature: 1 mention in troubleshooting scenario

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…red ambience)

Python was shipping 0.5s/24KB single-tone placeholders; TS upgraded to
3s/144KB layered cafe/office/highway/wind ambience in PR #561. Bringing
Python into parity so backgroundNoise() ships the same quality on
both sides. Byte-identical to the TS asset bundle (same deterministic
seeded generator).

WAVs: 144044 bytes each, 3.00s, mono PCM16 24kHz.
LICENSES.md: updated to describe the new generator + layered content;
notes the cross-language byte-identical copy with a reference to the
single-canonical-generator follow-up at #588.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@drewdrewthis drewdrewthis changed the title chore: docs cleanup — delete 2 stale orphans, move LOW_RISK policy to .github/ chore: main-side cleanup — docs + spec + python/TS parity May 29, 2026
drewdrewthis and others added 2 commits May 29, 2026 14:59
User feedback: outputs/ should be a parent for all test-run artifact
types (recordings now, traces/logs/screenshots later). Adds the
recordings/ nesting + a new outputs/README.md describing the shape.
Writer in _recording_helper.py updated to point at outputs/recordings/.

Symmetric with the TS-side nesting in PR #561.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
JSON allows duplicate keys (last wins) but Node 24's tsx/oxc parser
rejects with a strict JSON parse error. This was breaking 21 test
files on the ci-checks (24.x) CI matrix on both main and PR #586.
Removes the duplicate; first occurrence (same value, ES2022) is kept.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
drewdrewthis and others added 3 commits May 29, 2026 17:41
Pre-existing failure masked by the tsconfig parse error; surfaced
after PR #586's tsconfig fix unblocked the test loader.

Commit 71dd5ed (origin/main, 2026-05-29) added 'interruption' to the
feature step:
  And it declares: streaming_transcripts, native_vad, dtmf, interruption,
                   input_formats, output_formats
but did not update the matching binding string in
voice-contract-surface.test.ts, so vitest-cucumber raised
StepAbleUnknowStepError for 'Scenario: Every adapter publishes a
capabilities attribute' at specs/voice-agents.feature:751.

The binding body already asserted empty.interruption — only the
title string was stale. Aligning it restores the green path:

  Test Files  1 passed (1)
  Tests       16 passed (16)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
gemini-3-pro-preview returned 404 on the live Gemini API:
"This model models/gemini-3-pro-preview is no longer available.
Please update your code to use a newer model for the latest features
and improvements."

Swapped both occurrences (JudgeAgent + litellm.completion) for
gemini-2.5-pro — current stable pro-tier Gemini, same intent as the
original (pro-tier judge). The other Gemini reference in this file
(UserSimulatorAgent on gemini-2.5-flash) and other repo Gemini call
sites already use the gemini-2.5 family.

Pre-existing failure on main; this unblocks main CI and PR #586.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The `gh pr diff` API returns 406 when a PR's diff exceeds GitHub's 20k-line
cap (observed on PR #561's evaluate run 26644602950). The downstream
"Fail fast for oversized diffs" step already handles `oversized=true`
gracefully (sets qualifies=false, exits 0, posts manual-review-required
review), but the upstream `gh pr diff` call crashes before reaching it.

Wrap the fetch: on HTTP 406, set oversized=true and exit 0. Other
failures still propagate as exit 1.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Follow-up to bafdbf7: the 406-handling fix now emits an
oversized_reason output so the downstream "Fail fast for oversized
diffs" step can post a more honest review explaining WHY evaluation
was skipped (fetch failure vs char-count exceeded).

Pass the reason via env: pattern (intermediate environment variable,
not direct ${{ }} expansion in run:) per the documented GitHub Actions
shell-injection hardening pattern. Use heredoc for GITHUB_OUTPUT writes
since gh CLI stderr can be multi-line and would otherwise corrupt the
output-file format.

Pattern source: PR #572 commit 7dd9311.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@drewdrewthis
Copy link
Copy Markdown
Collaborator Author

drewdrewthis commented May 31, 2026

/prove-it: PASS + /review: clean (re-verified at 06489d8, after addressing review findings).

All ACs across #571 / #587 / #589 / #590 + PR-body claims map to first-hand evidence:

Review findings addressed (commit 06489d8):

  1. (blocking) scripts/validate-pr-auto-approve.sh AC-1.11 block asserted two workflows exist that were deleted in PR feat: scenario events #4 → validator exited 1 despite the workflow being correct. The PR body cited this script as verification. Fixed — stale block removed, validator now exits 0 (mirrors the spec's own AC-1.11 removal).
  2. LICENSES.md regen instruction pointed at generate-noise-samples.mjs, which ships on feat(typescript-sdk): voice agent testing — consolidated clean stack #561 and isn't on main yet. Fixed — reworded to reference feat(typescript-sdk): voice agent testing — consolidated clean stack #561 + follow-up voice: fold to one canonical noise-sample generator (single source-of-truth) #588.
  3. PR-body "byte-identical to the TS asset bundle" softened — no TS bundle on main to compare against; assets are reproducible from the same seed.

Non-fixes (deliberate): spec Background line listing the two now-deleted workflows is the pre-sequence (t0) baseline every scenario references — not a current-state claim — so it's correct as-is. Filed #592 for the systemic gap that the validator is never run in CI.

🤖 /prove-it + /review via Claude Code

The AC-1.11 block asserted approval-or-hotfix.yml and low-risk-evaluation.yml
still exist, but both were deleted in PR #4 of the gate-swap sequence, so the
validator exited 1 even though the workflow is correct. The matching spec
scenario was already removed in this PR; this aligns the validator script.

Also fix the noise-sample LICENSES.md regeneration note, which pointed at
generate-noise-samples.mjs — a script that ships with PR #561 and is not on
main yet.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@drewdrewthis
Copy link
Copy Markdown
Collaborator Author

📣 Slack PR-review request posted to #dev — thread. (CI 🟢 on 06489d8, /prove-it + /review clean.) Label slack-requested may be absent due to the org OAuth-App restriction on this token; this comment is the dedup marker.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 1, 2026

Automated low-risk assessment

This PR was evaluated against the repository's Low-Risk Pull Requests procedure and does not qualify as low risk.

This PR's diff exceeds the size limit for automated low-risk evaluation. Manual review required.

This PR requires a manual review before merging.

drewdrewthis added a commit that referenced this pull request Jun 1, 2026
gemini-3-pro-preview returned 404 on the live Gemini API:
"This model models/gemini-3-pro-preview is no longer available.
Please update your code to use a newer model for the latest features
and improvements."

Swapped both occurrences (JudgeAgent + litellm.completion) for
gemini-2.5-pro — current stable pro-tier Gemini, same intent as the
original (pro-tier judge). The other Gemini reference in this file
(UserSimulatorAgent on gemini-2.5-flash) and other repo Gemini call
sites already use the gemini-2.5 family.

Pre-existing failure on main; this unblocks main CI and PR #586.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@drewdrewthis drewdrewthis merged commit 371f94c into main Jun 1, 2026
25 checks passed
@drewdrewthis drewdrewthis deleted the chore/docs-cleanup-orphans branch June 1, 2026 15:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ci: evaluate workflow hard-fails on PRs >20k-line diff instead of its oversized path

2 participants