fix(media): suppress local whisper progress transcripts#87393
fix(media): suppress local whisper progress transcripts#87393kesslerio wants to merge 7 commits into
Conversation
|
Codex review: needs maintainer review before merge. Reviewed June 1, 2026, 11:52 PM ET / 03:52 UTC. Summary PR surface: Source +30, Tests +337, Docs 0. Total +367 across 5 files. Reproducibility: yes. at source level: current main reads inferred transcript files only when non-empty and otherwise falls through to stdout, while the linked issue gives a concrete wrapper/progress-stdout reproduction shape. I did not execute a failing current-main run because this review is read-only. Review metrics: 1 noteworthy metric.
Merge readiness Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch. Rank-up moves:
Risk before merge
Maintainer options:
Next step before merge
Security Review detailsBest possible solution: Land a focused media-runner fix that makes deterministic file-output STT paths authoritative, with explicit maintainer acceptance or removal of the local-whisper wrapper exception before merge. Do we have a high-confidence way to reproduce the issue? Yes at source level: current main reads inferred transcript files only when non-empty and otherwise falls through to stdout, while the linked issue gives a concrete wrapper/progress-stdout reproduction shape. I did not execute a failing current-main run because this review is read-only. Is this the best way to solve the issue? Mostly yes: making known deterministic transcript-file outputs authoritative is the narrowest repair for the bug. The local-whisper path-based exception is the remaining owner-boundary and compatibility decision before merge. AGENTS.md: found and applied where relevant. Codex review notes: model gpt-5.5, reasoning high; reviewed against 39f319c7a47e. Label changesLabel changes:
Label justifications:
Evidence reviewedPR surface: Source +30, Tests +337, Docs 0. Total +367 across 5 files. View PR surface stats
What I checked:
Likely related people:
What the crustacean ranks mean
Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics. How this review workflow works
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: fc49c6db76
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| : commandId === "node" | ||
| ? resolveNodeWhisperWrapperOutputPath(params.args, params.mediaPath) | ||
| : null; | ||
| const fileOutputAuthoritative = commandId === "node" && fileOutputPath !== null; |
There was a problem hiding this comment.
Don't fall back from inferred file outputs
When a file path is inferred for whisper-cli, whisper, or parakeet-mlx, fileOutputAuthoritative remains false, so an empty transcript file skips the return here and falls through to params.stdout.trim() later. In configurations such as whisper ... --output_format txt --output_dir {{OutputDir}} where stdout contains progress or banner text, that status text is still injected as the transcript, leaving the file-output CLI regression unfixed for the built-in paths this runner already recognizes.
Useful? React with 👍 / 👎.
|
ClawSweeper PR egg ✨ Hatched: 🥚 common Brave Merge Sprite Hatch commandComment Hatchability rules:
Rarity: 🥚 common. What is this egg doing here?
|
|
Updated this PR after ClawSweeper feedback. Head: What changed:
Reviewer/bot feedback addressed:
Proof:
Validation: pnpm exec oxfmt --write src/media-understanding/runner.entries.ts src/media-understanding/runner.cli-audio.test.ts
pnpm exec oxfmt --check --threads=1 src/media-understanding/runner.entries.ts src/media-understanding/runner.cli-audio.test.ts
git diff --check
timeout -k5s 150s node scripts/run-vitest.mjs src/media-understanding/runner.cli-audio.test.ts -t "whisper-cli inferred|inferred whisper transcript|parakeet-mlx inferred|local whisper|node wrapper"
timeout -k5s 180s node scripts/run-tsgo.mjs -p test/tsconfig/tsconfig.core.test.json --incremental --tsBuildInfoFile .artifacts/tsgo-cache/core-test.tsbuildinfoResult:
Current state:
@clawsweeper re-review |
|
🦞🧹 I asked ClawSweeper to review this item again. Re-review progress:
|
Proof: real Telegram audio path at PR head ec4e753I added a channel-level proof pass after the earlier focused test/typecheck proof. Setup:
Redacted gateway transcript: 2026-05-27T13:05:57.270-07:00 [telegram] [default] starting provider (@kesslerClawBot)
2026-05-27T13:06:00.435-07:00 [telegram] update: {... "voice":{"duration":6,"mime_type":"audio/ogg","file_size":125925}}
2026-05-27T13:06:03.048-07:00 audio-preflight: transcribing attachment 0 for mention check
2026-05-27T13:06:03.064-07:00 Media understanding via CLI: node .../proof/local-whisper/transcribe.js .../media/inbound/<redacted>.ogg --output-dir /tmp/openclaw/openclaw-media-cli-<redacted>
2026-05-27T13:06:03.151-07:00 Whisper Voice Transcription
Model: proof
Transcribing with Whisper...
2026-05-27T13:06:03.162-07:00 Media understanding audio: skipped (0/1) reason=empty output
2026-05-27T13:06:03.182-07:00 telegram inbound: ... preview="[Telegram Art Kessler (@kesslerIO) id:8352721935 +1h Wed 2026-05-27 13:05 PDT] <media:audio>"
2026-05-27T13:06:03.191-07:00 [telegram] Inbound message telegram:<redacted> -> @kesslerClawBot (direct, audio/ogg, 13 chars)
2026-05-27T13:06:03.809-07:00 Media understanding via CLI: node .../proof/local-whisper/transcribe.js .../media/inbound/<redacted>.ogg --output-dir /tmp/openclaw/openclaw-media-cli-<redacted>
2026-05-27T13:06:03.918-07:00 Whisper Voice Transcription
Model: proof
Transcribing with Whisper...
2026-05-27T13:06:03.926-07:00 Media understanding audio: skipped (0/1) reason=empty outputWhat this proves:
Additional direct CLI check against the same disposable staging config: OPENCLAW_CONFIG_PATH=.../openclaw.json OPENCLAW_STATE_DIR=.../state \
node scripts/run-node.mjs infer audio transcribe --file /tmp/openclaw-whisper-proof-o3cW2g/sample.wav --json
Error: No transcript returned for audio: /tmp/openclaw-whisper-proof-o3cW2g/sample.wavThat direct check also confirms stdout progress did not become a transcript. Validation already run on this head: pnpm exec oxfmt --check --threads=1 src/media-understanding/runner.entries.ts src/media-understanding/runner.cli-audio.test.ts
git diff --check
timeout -k5s 150s node scripts/run-vitest.mjs src/media-understanding/runner.cli-audio.test.ts -t "whisper-cli inferred|inferred whisper transcript|parakeet-mlx inferred|local whisper|node wrapper"
timeout -k5s 180s node scripts/run-tsgo.mjs -p test/tsconfig/tsconfig.core.test.json --incremental --tsBuildInfoFile .artifacts/tsgo-cache/core-test.tsbuildinfo
|
|
Updated after the latest ClawSweeper review. Head: What changed:
Feedback addressed:
Validation: pnpm exec oxfmt --check --threads=1 src/media-understanding/apply.test.ts src/media-understanding/runner.entries.ts src/media-understanding/runner.cli-audio.test.ts
git diff --check
timeout -k5s 180s node scripts/run-vitest.mjs run --config test/vitest/vitest.media-understanding.config.ts --reporter=verbose src/media-understanding/runner.cli-audio.test.ts -t "parakeet|whisper-cli inferred|inferred whisper transcript|local whisper|node wrapper"
timeout -k5s 180s node scripts/run-vitest.mjs run --config test/vitest/vitest.media-understanding.config.ts --reporter=verbose src/media-understanding/runner.cli-audio.test.ts src/media-understanding/apply.test.ts
timeout -k5s 180s node scripts/run-tsgo.mjs -p test/tsconfig/tsconfig.core.test.json --incremental --tsBuildInfoFile .artifacts/tsgo-cache/core-test.tsbuildinfo
node --input-type=module <local real-behavior-proof-policy check against PR body>Result:
Current state:
@clawsweeper re-review |
|
🦞🧹 I asked ClawSweeper to review this item again. Re-review progress:
|
|
Addressed the ClawSweeper P2 docs feedback on head What changed:
Validation: pnpm exec oxfmt --check --threads=1 docs/nodes/audio.md docs/nodes/media-understanding.md
git diff --check
pnpm test src/media-understanding/runner.cli-audio.test.ts src/media-understanding/apply.test.ts
pnpm check:docsResults:
This should address the remaining author-facing blocker from the durable ClawSweeper review. @clawsweeper re-review |
|
🦞🧹 I asked ClawSweeper to review this item again. Re-review progress:
|
|
@clawsweeper re-review Current head |
|
🦞🧹 I asked ClawSweeper to review this item again. Re-review progress:
|
|
@clawsweeper re-review Current head What changed since the last review:
Validation run locally: pnpm exec oxfmt --check --threads=1 docs/nodes/audio.md docs/nodes/media-understanding.md src/media-understanding/apply.test.ts src/media-understanding/runner.cli-audio.test.ts src/media-understanding/runner.entries.ts
git diff --check origin/main...HEAD
timeout -k5s 180s node scripts/run-vitest.mjs run --config test/vitest/vitest.media-understanding.config.ts --reporter=verbose src/media-understanding/runner.cli-audio.test.ts -t "parakeet|whisper-cli inferred|inferred whisper transcript|local whisper|node wrapper"
timeout -k5s 180s node scripts/run-vitest.mjs run --config test/vitest/vitest.media-understanding.config.ts --reporter=verbose src/media-understanding/runner.cli-audio.test.ts src/media-understanding/apply.test.tsResults:
One validation limitation: the broad core tsgo command now fails on unrelated current-main declaration issues for |
|
🦞🧹 I asked ClawSweeper to review this item again. Re-review progress:
|
What
CLI audio transcription now treats deterministic transcript-file outputs as authoritative instead of falling back to progress/status stdout.
This covers:
whisper-cliwith-otxt/--output-txtand-of/--output-filewhisperwith--output_format txtand--output_dirparakeet-mlxwith explicit--output-format txtand--output-dir--output-dir,--output, and-oEmpty or missing inferred transcript files now produce no transcript instead of exposing status/progress output as user text. Unexpected transcript-file read errors still surface. Other Node wrappers, parakeet default/non-txt output, and other non-authoritative CLI paths keep the existing stdout fallback behavior.
Fixes #87384.
Why
Some local speech-to-text wrappers print progress banners to stdout while writing the actual transcript to a file. When that transcript file is empty or missing, OpenClaw could previously treat the progress banner as the user transcript. That is wrong for live audio messages because wrapper status text can leak into the inbound conversation body.
Real behavior proof
Behavior or issue addressed: local Whisper-style CLI wrapper status text could be injected as an audio transcript when the expected transcript file was empty or missing.
Real environment tested: refreshed PR head
3f383768675217b456189d972867a1d0bdde7858, disposable OpenClaw staging state, stored Telegram DM voice media (audio/ogg) from@kesslerClawBotstaging proof, and a controlled local-whisper-compatible wrapper that printed Whisper progress to stdout, wrote an empty inferred transcript file under{{OutputDir}}, and exited successfully.Exact steps or command run after this patch: merged current
origin/maininto the PR branch, then ran a current-head direct media runtime proof throughsrc/media-understanding/runner.entries.ts/runCliEntrywithnode <proof-wrapper> <stored-telegram-audio.ogg> --output-dir <tmp>. The same wrapper/media path was previously exercised through the disposable Telegram gateway with@kesslerClawBot.Evidence after fix (screenshot, recording, terminal capture, console output, redacted runtime log, linked artifact, or copied live output): public proof gist with current-head direct runtime output plus the redacted live Telegram gateway excerpt: https://gist.github.com/kesslerio/97110dff154bef75ddb80f35c5cdc8e8
Prior live Telegram gateway excerpt for the same wrapper behavior:
Observed result after fix: current-head
runCliEntryreturnednullfor the empty authoritative transcript file; in the live Telegram proof, OpenClaw logged wrapper progress output but kept the inbound Telegram body as<media:audio>. The progress stdout was not injected as transcript text.What was not tested: real Whisper model transcription quality and real upstream
parakeet-mlxbinary execution were not tested in the staging proof. The parakeet compatibility path is covered by focused regression tests instead.Proof limitations or environment constraints: the current-head proof uses a controlled wrapper to force the empty-output condition deterministically. It proves OpenClaw's wrapper-output handling on the changed source path; the live Telegram proof was captured before the current-main merge, but the refreshed PR diff still only changes the same five media/doc/test files against current
origin/main.Before evidence (optional but encouraged): Bug: CLI audio transcription can use progress stdout when transcript file is empty #87384 describes the reported failure mode; current main fell back from missing/empty inferred transcript files to stdout for known CLI paths.
Validation
Result:
Validation limitation:
timeout -k5s 180s node scripts/run-tsgo.mjs -p test/tsconfig/tsconfig.core.test.json --incremental --tsBuildInfoFile .artifacts/tsgo-cache/core-test.tsbuildinfofailed after the current-main merge on declaration errors in unrelated current-main files forproper-lockfile,cross-spawn, andhosted-git-info. No media-understanding files were reported in that failure.CI Status
CI is running on current head
3f383768675217b456189d972867a1d0bdde7858.The branch now includes current
origin/main(7d5d62511f) and containsscripts/github/dependency-guard.mjs, so the previous Dependency Guard failure from checking out53ad531df9without that script should be cleared by this head.Maintainer Edits
Maintainer edits are enabled on this fork PR.
AI Assistance
AI-assisted implementation and review. Human-run proof was captured through the disposable staging gateway and supplemented with a current-head direct runtime proof. Local review included scoped simplify/code-review passes and focused media-understanding validation.