Skip to content

fix(media): suppress local whisper progress transcripts#87393

Open
kesslerio wants to merge 7 commits into
openclaw:mainfrom
kesslerio:fix/local-whisper-cli-output
Open

fix(media): suppress local whisper progress transcripts#87393
kesslerio wants to merge 7 commits into
openclaw:mainfrom
kesslerio:fix/local-whisper-cli-output

Conversation

@kesslerio
Copy link
Copy Markdown
Contributor

@kesslerio kesslerio commented May 27, 2026

What

CLI audio transcription now treats deterministic transcript-file outputs as authoritative instead of falling back to progress/status stdout.

This covers:

  • whisper-cli with -otxt / --output-txt and -of / --output-file
  • whisper with --output_format txt and --output_dir
  • parakeet-mlx with explicit --output-format txt and --output-dir
  • the local Whisper Node wrapper output paths via --output-dir, --output, and -o

Empty or missing inferred transcript files now produce no transcript instead of exposing status/progress output as user text. Unexpected transcript-file read errors still surface. Other Node wrappers, parakeet default/non-txt output, and other non-authoritative CLI paths keep the existing stdout fallback behavior.

Fixes #87384.

Why

Some local speech-to-text wrappers print progress banners to stdout while writing the actual transcript to a file. When that transcript file is empty or missing, OpenClaw could previously treat the progress banner as the user transcript. That is wrong for live audio messages because wrapper status text can leak into the inbound conversation body.

Real behavior proof

  • Behavior or issue addressed: local Whisper-style CLI wrapper status text could be injected as an audio transcript when the expected transcript file was empty or missing.

  • Real environment tested: refreshed PR head 3f383768675217b456189d972867a1d0bdde7858, disposable OpenClaw staging state, stored Telegram DM voice media (audio/ogg) from @kesslerClawBot staging proof, and a controlled local-whisper-compatible wrapper that printed Whisper progress to stdout, wrote an empty inferred transcript file under {{OutputDir}}, and exited successfully.

  • Exact steps or command run after this patch: merged current origin/main into the PR branch, then ran a current-head direct media runtime proof through src/media-understanding/runner.entries.ts / runCliEntry with node <proof-wrapper> <stored-telegram-audio.ogg> --output-dir <tmp>. The same wrapper/media path was previously exercised through the disposable Telegram gateway with @kesslerClawBot.

  • Evidence after fix (screenshot, recording, terminal capture, console output, redacted runtime log, linked artifact, or copied live output): public proof gist with current-head direct runtime output plus the redacted live Telegram gateway excerpt: https://gist.github.com/kesslerio/97110dff154bef75ddb80f35c5cdc8e8

    PR #87393 current-head runCliEntry proof
    head=3f383768675217b456189d972867a1d0bdde7858
    media=<staging-state>/media/inbound/ec282e95-e8cd-4472-b8ce-d15d73cac548.ogg
    wrapper=<staging-state>/proof/local-whisper/transcribe.js
    
    result null
    PASS: runCliEntry returned null; local-whisper wrapper progress stdout was not exposed as transcript text.

    Prior live Telegram gateway excerpt for the same wrapper behavior:

    audio-preflight: transcribing attachment 0 for mention check
    Media understanding via CLI: node <staging-state>/proof/local-whisper/transcribe.js <staging-state>/media/inbound/ec282e95...ogg --output-dir /tmp/openclaw/openclaw-media-cli-...
    Whisper Voice Transcription
    Model: proof
    Transcribing with Whisper...
    Media understanding audio: skipped (0/1) reason=empty output
    telegram inbound: chatId=<redacted> from=telegram:<redacted> len=92 preview="[Telegram ...] <media:audio>"
  • Observed result after fix: current-head runCliEntry returned null for the empty authoritative transcript file; in the live Telegram proof, OpenClaw logged wrapper progress output but kept the inbound Telegram body as <media:audio>. The progress stdout was not injected as transcript text.

  • What was not tested: real Whisper model transcription quality and real upstream parakeet-mlx binary execution were not tested in the staging proof. The parakeet compatibility path is covered by focused regression tests instead.

  • Proof limitations or environment constraints: the current-head proof uses a controlled wrapper to force the empty-output condition deterministically. It proves OpenClaw's wrapper-output handling on the changed source path; the live Telegram proof was captured before the current-main merge, but the refreshed PR diff still only changes the same five media/doc/test files against current origin/main.

  • Before evidence (optional but encouraged): Bug: CLI audio transcription can use progress stdout when transcript file is empty #87384 describes the reported failure mode; current main fell back from missing/empty inferred transcript files to stdout for known CLI paths.

Validation

pnpm exec oxfmt --check --threads=1 docs/nodes/audio.md docs/nodes/media-understanding.md src/media-understanding/apply.test.ts src/media-understanding/runner.cli-audio.test.ts src/media-understanding/runner.entries.ts
git diff --check origin/main...HEAD
timeout -k5s 180s node scripts/run-vitest.mjs run --config test/vitest/vitest.media-understanding.config.ts --reporter=verbose src/media-understanding/runner.cli-audio.test.ts -t "parakeet|whisper-cli inferred|inferred whisper transcript|local whisper|node wrapper"
timeout -k5s 180s node scripts/run-vitest.mjs run --config test/vitest/vitest.media-understanding.config.ts --reporter=verbose src/media-understanding/runner.cli-audio.test.ts src/media-understanding/apply.test.ts

Result:

  • formatting passed
  • diff whitespace check passed
  • focused parakeet/Whisper/local-wrapper tests passed: 9 tests, 4 skipped
  • broader media-understanding audio files passed: 65 tests
  • current-head direct runtime proof passed and is linked above

Validation limitation:

  • timeout -k5s 180s node scripts/run-tsgo.mjs -p test/tsconfig/tsconfig.core.test.json --incremental --tsBuildInfoFile .artifacts/tsgo-cache/core-test.tsbuildinfo failed after the current-main merge on declaration errors in unrelated current-main files for proper-lockfile, cross-spawn, and hosted-git-info. No media-understanding files were reported in that failure.

CI Status

CI is running on current head 3f383768675217b456189d972867a1d0bdde7858.

The branch now includes current origin/main (7d5d62511f) and contains scripts/github/dependency-guard.mjs, so the previous Dependency Guard failure from checking out 53ad531df9 without that script should be cleared by this head.

Maintainer Edits

Maintainer edits are enabled on this fork PR.

AI Assistance

AI-assisted implementation and review. Human-run proof was captured through the disposable staging gateway and supplemented with a current-head direct runtime proof. Local review included scoped simplify/code-review passes and focused media-understanding validation.


Compound Engineering
Codex

@openclaw-barnacle openclaw-barnacle Bot added size: M proof: supplied External PR includes structured after-fix real behavior proof. labels May 27, 2026
@clawsweeper
Copy link
Copy Markdown
Contributor

clawsweeper Bot commented May 27, 2026

Codex review: needs maintainer review before merge. Reviewed June 1, 2026, 11:52 PM ET / 03:52 UTC.

Summary
The PR changes media CLI audio output resolution, tests, and docs so deterministic transcript-file outputs are treated as authoritative instead of falling back to progress stdout.

PR surface: Source +30, Tests +337, Docs 0. Total +367 across 5 files.

Reproducibility: yes. at source level: current main reads inferred transcript files only when non-empty and otherwise falls through to stdout, while the linked issue gives a concrete wrapper/progress-stdout reproduction shape. I did not execute a failing current-main run because this review is read-only.

Review metrics: 1 noteworthy metric.

  • CLI output-source policy: 4 CLI path families changed. whisper-cli, whisper, explicit-txt parakeet-mlx, and local-whisper node wrappers stop using stdout fallback when the expected transcript file is empty or missing.

Merge readiness
Overall: 🐚 platinum hermit
Proof: 🦞 diamond lobster
Patch quality: 🐚 platinum hermit
Result: ready for maintainer review.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Rank-up moves:

  • [P1] Get maintainer sign-off on the fail-closed transcript-file behavior and the local-whisper wrapper exception before merge.

Risk before merge

  • [P1] Known file-output CLI entries and the local-whisper node wrapper now fail closed on empty or missing inferred transcript files and throw unexpected transcript-file read errors instead of falling back to stdout, which can change existing local setups that relied on stdout fallback.
  • [P1] The local-whisper handling is path/name based in core rather than a documented generic CLI output contract, so maintainers should decide whether that external-wrapper exception belongs in core or should stay a separate plugin/skill contract.

Maintainer options:

  1. Accept the file-output contract change
    Maintainers can land this once they agree that empty or missing deterministic transcript files should fail closed for the covered CLI paths despite stdout fallback compatibility changes.
  2. Narrow the external-wrapper exception
    Before merge, remove or redesign the local-whisper path match if maintainers do not want core to recognize a skill-specific wrapper name.
  3. Pause for a generic output policy
    If custom wrapper support needs a durable contract, pause this PR and design an explicit CLI output-source policy separately from the known-command bug fix.

Next step before merge

  • [P2] Maintainers need to accept the compatibility change and local-wrapper boundary; there is no narrow automation repair request from this review.

Security
Cleared: No concrete security or supply-chain regression was found; the diff changes media runner output selection, focused tests, and docs without new dependencies, workflows, or secret handling.

Review details

Best possible solution:

Land a focused media-runner fix that makes deterministic file-output STT paths authoritative, with explicit maintainer acceptance or removal of the local-whisper wrapper exception before merge.

Do we have a high-confidence way to reproduce the issue?

Yes at source level: current main reads inferred transcript files only when non-empty and otherwise falls through to stdout, while the linked issue gives a concrete wrapper/progress-stdout reproduction shape. I did not execute a failing current-main run because this review is read-only.

Is this the best way to solve the issue?

Mostly yes: making known deterministic transcript-file outputs authoritative is the narrowest repair for the bug. The local-whisper path-based exception is the remaining owner-boundary and compatibility decision before merge.

AGENTS.md: found and applied where relevant.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 39f319c7a47e.

Label changes

Label changes:

  • add proof: sufficient: Contributor real behavior proof is sufficient. The PR body and linked gist provide current-head runtime output plus redacted live Telegram gateway logs showing progress stdout was not exposed as transcript text after the patch.
  • add rating: 🐚 platinum hermit: Overall readiness is 🐚 platinum hermit; proof is 🦞 diamond lobster and patch quality is 🐚 platinum hermit.
  • add status: 👀 ready for maintainer look: ClawSweeper has no concrete contributor-facing blocker left for this PR. Sufficient (logs): The PR body and linked gist provide current-head runtime output plus redacted live Telegram gateway logs showing progress stdout was not exposed as transcript text after the patch.
  • remove rating: 🌊 off-meta tidepool: Current PR rating is rating: 🐚 platinum hermit, so this older rating label is no longer current.

Label justifications:

  • P2: This is a normal-priority media transcription correctness fix with limited blast radius in CLI audio understanding.
  • merge-risk: 🚨 compatibility: Merging changes existing CLI fallback behavior for empty, missing, or unreadable deterministic transcript-file outputs.
  • rating: 🐚 platinum hermit: Overall readiness is 🐚 platinum hermit; proof is 🦞 diamond lobster and patch quality is 🐚 platinum hermit.
  • status: 👀 ready for maintainer look: ClawSweeper has no concrete contributor-facing blocker left for this PR. Sufficient (logs): The PR body and linked gist provide current-head runtime output plus redacted live Telegram gateway logs showing progress stdout was not exposed as transcript text after the patch.
  • proof: sufficient: Contributor real behavior proof is sufficient. The PR body and linked gist provide current-head runtime output plus redacted live Telegram gateway logs showing progress stdout was not exposed as transcript text after the patch.
Evidence reviewed

PR surface:

Source +30, Tests +337, Docs 0. Total +367 across 5 files.

View PR surface stats
Area Files Added Removed Net
Source 1 41 11 +30
Tests 2 346 9 +337
Docs 2 2 2 0
Config 0 0 0 0
Generated 0 0 0 0
Other 0 0 0 0
Total 5 389 22 +367

What I checked:

  • Current main fallback behavior: Current main only returns an inferred transcript file when the file exists and has non-empty trimmed content; empty or missing files fall through to stdout trimming. (src/media-understanding/runner.entries.ts:217, 39f319c7a47e)
  • PR authoritative output behavior: The PR head resolves known file-output paths, treats any resolved file path as authoritative, returns empty output on ENOENT, and throws unexpected read errors instead of falling back to stdout. (src/media-understanding/runner.entries.ts:237, 3f3837686752)
  • Runtime entry point checked: runCliEntry creates the temporary OutputDir/OutputBase, templates CLI args, runs the command, and passes stdout plus the resolved media path into resolveCliOutput before deciding whether to return an audio transcription. (src/media-understanding/runner.entries.ts:924, 3f3837686752)
  • Regression coverage added: The PR adds focused coverage for empty/missing whisper-cli, whisper, explicit-txt parakeet-mlx, local-whisper node wrapper behavior, parakeet omitted-format stdout fallback, other node-wrapper stdout fallback, and unexpected read errors. (src/media-understanding/runner.cli-audio.test.ts:179, 3f3837686752)
  • Docs updated for parakeet compatibility: The docs now say parakeet-mlx reads the inferred .txt file only when --output-format txt is explicit, preserving stdout fallback for omitted or non-txt formats. Public docs: docs/nodes/media-understanding.md. (docs/nodes/media-understanding.md:300, 3f3837686752)
  • Real behavior proof inspected: The linked public gist shows current-head runCliEntry returning null for a controlled local-whisper-compatible wrapper that printed progress stdout and wrote an empty authoritative transcript file, plus a redacted Telegram gateway excerpt where the inbound body stayed as media:audio. (3f3837686752)

Likely related people:

  • steipete: GitHub path history shows repeated recent media runner and media core refactor work touching the central files and adjacent media execution contracts. (role: recent area contributor; confidence: high; commits: 242eab9d20f7, 77f1359612f6, 252a76d25c93; files: src/media-understanding/runner.entries.ts, src/media-understanding/runner.ts)
  • TurboTheTurtle: Authored the nearby empty Sherpa structured transcript fix in the same CLI audio output path, making them relevant for empty-output semantics. (role: recent adjacent bugfix author; confidence: medium; commits: 46030f548986; files: src/media-understanding/runner.entries.ts, src/media-understanding/runner.cli-audio.test.ts, src/media-understanding/apply.test.ts)
  • Spolen23: GitHub history shows earlier infer CLI reliability work with media-understanding runner tests, which is adjacent to the output-source contract this PR changes. (role: earlier media CLI reliability contributor; confidence: medium; commits: 12aaef9035ea; files: src/media-understanding/runner.cli-audio.test.ts, src/media-understanding/runner.entries.ts)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: fc49c6db76

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

: commandId === "node"
? resolveNodeWhisperWrapperOutputPath(params.args, params.mediaPath)
: null;
const fileOutputAuthoritative = commandId === "node" && fileOutputPath !== null;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Don't fall back from inferred file outputs

When a file path is inferred for whisper-cli, whisper, or parakeet-mlx, fileOutputAuthoritative remains false, so an empty transcript file skips the return here and falls through to params.stdout.trim() later. In configurations such as whisper ... --output_format txt --output_dir {{OutputDir}} where stdout contains progress or banner text, that status text is still injected as the transcript, leaving the file-output CLI regression unfixed for the built-in paths this runner already recognizes.

Useful? React with 👍 / 👎.

@clawsweeper clawsweeper Bot added rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. labels May 27, 2026
@clawsweeper
Copy link
Copy Markdown
Contributor

clawsweeper Bot commented May 27, 2026

ClawSweeper PR egg

✨ Hatched: 🥚 common Brave Merge Sprite

Hatch command

Comment @clawsweeper hatch when this PR is hatchable.

Hatchability rules:

  • Merged PRs are hatchable.
  • Open PRs are hatchable when they are status: 👀 ready for maintainer look, status: 🚀 automerge armed, or labeled clawsweeper:automerge.
  • Closed unmerged PRs are hatchable only when one of those hatchable labels is still present in the durable record.

Rarity: 🥚 common.
Trait: keeps receipts.
Image traits: location proof lagoon; accessory CI status badge; palette amber, ink, and glacier blue; mood curious; pose leaning over a miniature review desk; shell translucent glimmer shell; lighting warm desk-lamp glow; background delicate sparkle particles.
Share on X: post this hatch
Copy: My PR egg hatched a 🥚 common Brave Merge Sprite in ClawSweeper.

What is this egg doing here?
  • Eggs appear after the PR passes real-behavior proof. It is here for vibes, not verdicts: it does not change labels, ratings, merge decisions, or automation.
  • The shell reacts to review momentum: open follow-up work warms it up, re-review makes it wobble, and a clean final review lets it hatch.
  • Hatchability usually comes from sufficient real-behavior proof, no blocking P0/P1/P2 findings, no security attention needed, and clean correctness. A merged PR is already final, so merge makes the egg hatchable independently.
  • The hatch is seeded from this repository and PR number, so the same PR keeps the same creature; the reviewed head SHA can only change safe visual details.
  • Rarity is just collectible sparkle: 🥚 common, 🌱 uncommon, 💎 rare, ✨ glimmer, and 🌈 legendary.

@kesslerio
Copy link
Copy Markdown
Contributor Author

Updated this PR after ClawSweeper feedback.

Head: ec4e753514fbfc9f129a84f67f18c7b3f422a8b3

What changed:

  • Broadened the fix from local-whisper-only to the deterministic inferred transcript-file contract.
  • Any CLI audio path where OpenClaw can infer a .txt transcript output path now treats that file as authoritative.
  • Empty or missing inferred transcript files no longer fall back to progress stdout for whisper-cli, whisper, parakeet-mlx, or the local-whisper node wrapper.
  • Explicit non-txt/stdout modes still preserve stdout fallback.

Reviewer/bot feedback addressed:

  • ClawSweeper P2: “Make inferred file outputs authoritative, not local-whisper only.”

Proof:

  • public artifact or excerpt: the PR diff now includes focused regression tests for whisper-cli, whisper, parakeet-mlx, and local-whisper wrapper behavior in src/media-understanding/runner.cli-audio.test.ts.
  • raw local audit source, if any: none.
  • proof limitation, if any: I attempted an ad-hoc real wrapper proof through runCliEntry, but the tsx --eval proof script timed out before output. So the current pushed proof is still focused test/typecheck proof, not live audio/gateway proof. A redacted real wrapper transcript or real audio run is still the remaining proof upgrade.

Validation:

pnpm exec oxfmt --write src/media-understanding/runner.entries.ts src/media-understanding/runner.cli-audio.test.ts
pnpm exec oxfmt --check --threads=1 src/media-understanding/runner.entries.ts src/media-understanding/runner.cli-audio.test.ts
git diff --check
timeout -k5s 150s node scripts/run-vitest.mjs src/media-understanding/runner.cli-audio.test.ts -t "whisper-cli inferred|inferred whisper transcript|parakeet-mlx inferred|local whisper|node wrapper"
timeout -k5s 180s node scripts/run-tsgo.mjs -p test/tsconfig/tsconfig.core.test.json --incremental --tsBuildInfoFile .artifacts/tsgo-cache/core-test.tsbuildinfo

Result:

  • formatter write/check passed
  • git diff --check passed
  • focused Vitest passed: 1 file, 8 tests passed, 4 skipped
  • core test typecheck passed

Current state:

  • conflicts: none known before this push; CI will verify
  • CI: running
  • proof: supplied but still needs real behavior proof for ClawSweeper clearance
  • rating target: platinum minimum, diamond preferred after real proof
  • current rating before this push: unranked krab because patch scope was too narrow and proof was unit-only
  • remaining blocker, if any: real behavior proof from a local wrapper or real audio transcription path

@clawsweeper re-review

@clawsweeper
Copy link
Copy Markdown
Contributor

clawsweeper Bot commented May 27, 2026

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

@kesslerio
Copy link
Copy Markdown
Contributor Author

Proof: real Telegram audio path at PR head ec4e753

I added a channel-level proof pass after the earlier focused test/typecheck proof.

Setup:

  • disposable OpenClaw staging gateway, started from this PR head
  • Telegram provider: @kesslerClawBot
  • inbound surface: Telegram DM voice message
  • audio media model: CLI node .../local-whisper/transcribe.js {{MediaPath}} --output-dir {{OutputDir}}
  • proof wrapper behavior: prints typical Whisper progress to stdout, writes an empty inferred transcript file, exits successfully

Redacted gateway transcript:

2026-05-27T13:05:57.270-07:00 [telegram] [default] starting provider (@kesslerClawBot)
2026-05-27T13:06:00.435-07:00 [telegram] update: {... "voice":{"duration":6,"mime_type":"audio/ogg","file_size":125925}}
2026-05-27T13:06:03.048-07:00 audio-preflight: transcribing attachment 0 for mention check
2026-05-27T13:06:03.064-07:00 Media understanding via CLI: node .../proof/local-whisper/transcribe.js .../media/inbound/<redacted>.ogg --output-dir /tmp/openclaw/openclaw-media-cli-<redacted>
2026-05-27T13:06:03.151-07:00 Whisper Voice Transcription
Model: proof
Transcribing with Whisper...
2026-05-27T13:06:03.162-07:00 Media understanding audio: skipped (0/1) reason=empty output
2026-05-27T13:06:03.182-07:00 telegram inbound: ... preview="[Telegram Art Kessler (@kesslerIO) id:8352721935 +1h Wed 2026-05-27 13:05 PDT] <media:audio>"
2026-05-27T13:06:03.191-07:00 [telegram] Inbound message telegram:<redacted> -> @kesslerClawBot (direct, audio/ogg, 13 chars)
2026-05-27T13:06:03.809-07:00 Media understanding via CLI: node .../proof/local-whisper/transcribe.js .../media/inbound/<redacted>.ogg --output-dir /tmp/openclaw/openclaw-media-cli-<redacted>
2026-05-27T13:06:03.918-07:00 Whisper Voice Transcription
Model: proof
Transcribing with Whisper...
2026-05-27T13:06:03.926-07:00 Media understanding audio: skipped (0/1) reason=empty output

What this proves:

  • the real Telegram DM/audio path downloaded the voice note and invoked media understanding from this PR head
  • the wrapper produced stdout progress (Whisper Voice Transcription, Transcribing with Whisper...)
  • because the inferred transcript file was empty, OpenClaw treated it as authoritative empty output and skipped transcription
  • the message body stayed as <media:audio>; progress stdout was not injected as transcript text

Additional direct CLI check against the same disposable staging config:

OPENCLAW_CONFIG_PATH=.../openclaw.json OPENCLAW_STATE_DIR=.../state \
  node scripts/run-node.mjs infer audio transcribe --file /tmp/openclaw-whisper-proof-o3cW2g/sample.wav --json

Error: No transcript returned for audio: /tmp/openclaw-whisper-proof-o3cW2g/sample.wav

That direct check also confirms stdout progress did not become a transcript.

Validation already run on this head:

pnpm exec oxfmt --check --threads=1 src/media-understanding/runner.entries.ts src/media-understanding/runner.cli-audio.test.ts
git diff --check
timeout -k5s 150s node scripts/run-vitest.mjs src/media-understanding/runner.cli-audio.test.ts -t "whisper-cli inferred|inferred whisper transcript|parakeet-mlx inferred|local whisper|node wrapper"
timeout -k5s 180s node scripts/run-tsgo.mjs -p test/tsconfig/tsconfig.core.test.json --incremental --tsBuildInfoFile .artifacts/tsgo-cache/core-test.tsbuildinfo

@clawsweeper re-review

@openclaw-barnacle openclaw-barnacle Bot added triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. and removed proof: supplied External PR includes structured after-fix real behavior proof. labels May 27, 2026
@clawsweeper clawsweeper Bot added proof: sufficient ClawSweeper judged the real behavior proof convincing. status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action. and removed status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. labels May 27, 2026
@openclaw-barnacle openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 27, 2026
@openclaw-barnacle openclaw-barnacle Bot added proof: supplied External PR includes structured after-fix real behavior proof. and removed triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. labels May 27, 2026
@kesslerio
Copy link
Copy Markdown
Contributor Author

Updated after the latest ClawSweeper review.

Head: ba1dbe8bc36a1ebf53b9a82011fcc9993818e0e6

What changed:

  • Restored stdout fallback for parakeet-mlx --output-dir when --output-format txt is omitted.
  • Kept authoritative .txt file behavior only for explicit parakeet txt output.
  • Added the focused regression for omitted --output-format.
  • Updated the PR body to use the exact real-behavior proof fields parsed by CI.

Feedback addressed:

  • ClawSweeper P1: keep parakeet defaults out of authoritative txt mode.
  • Real behavior proof CI: required field labels are now present and the local policy check passes.

Validation:

pnpm exec oxfmt --check --threads=1 src/media-understanding/apply.test.ts src/media-understanding/runner.entries.ts src/media-understanding/runner.cli-audio.test.ts
git diff --check
timeout -k5s 180s node scripts/run-vitest.mjs run --config test/vitest/vitest.media-understanding.config.ts --reporter=verbose src/media-understanding/runner.cli-audio.test.ts -t "parakeet|whisper-cli inferred|inferred whisper transcript|local whisper|node wrapper"
timeout -k5s 180s node scripts/run-vitest.mjs run --config test/vitest/vitest.media-understanding.config.ts --reporter=verbose src/media-understanding/runner.cli-audio.test.ts src/media-understanding/apply.test.ts
timeout -k5s 180s node scripts/run-tsgo.mjs -p test/tsconfig/tsconfig.core.test.json --incremental --tsBuildInfoFile .artifacts/tsgo-cache/core-test.tsbuildinfo
node --input-type=module <local real-behavior-proof-policy check against PR body>

Result:

  • formatting and git diff --check passed
  • focused parakeet/Whisper/local-wrapper tests passed: 9 passed, 4 skipped
  • broader media-understanding audio files passed: 65 passed
  • core TypeScript check passed
  • local real-behavior proof policy evaluation passed

Current state:

  • CI: running on the new head, no current failures visible at the time of this comment
  • proof: supplied; PR body now uses parseable template fields
  • remaining blocker expected from previous review: addressed

@clawsweeper re-review

@clawsweeper
Copy link
Copy Markdown
Contributor

clawsweeper Bot commented May 27, 2026

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

@clawsweeper clawsweeper Bot added proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🦐 gold shrimp Decent PR readiness signal, but merge confidence is limited. P2 Normal backlog priority with limited blast radius. merge-risk: 🚨 compatibility 🚨 May break existing users, config, migrations, defaults, or upgrade paths. and removed rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. labels May 27, 2026
@kesslerio
Copy link
Copy Markdown
Contributor Author

Addressed the ClawSweeper P2 docs feedback on head f85508ba3b.

What changed:

  • Updated docs/nodes/media-understanding.md and docs/nodes/audio.md so the parakeet note now says <output-dir>/<media-basename>.txt is read only when --output-format txt is explicit.
  • Clarified that omitted or non-txt parakeet output formats use stdout fallback.
  • No runtime behavior changed in this follow-up commit.

Validation:

pnpm exec oxfmt --check --threads=1 docs/nodes/audio.md docs/nodes/media-understanding.md
git diff --check
pnpm test src/media-understanding/runner.cli-audio.test.ts src/media-understanding/apply.test.ts
pnpm check:docs

Results:

  • formatting and diff whitespace checks passed
  • media-understanding tests passed: 2 files, 65 tests
  • docs checks passed: format, markdownlint, MDX, i18n glossary, links

This should address the remaining author-facing blocker from the durable ClawSweeper review.

@clawsweeper re-review

@openclaw-barnacle openclaw-barnacle Bot added the docs Improvements or additions to documentation label May 27, 2026
@clawsweeper
Copy link
Copy Markdown
Contributor

clawsweeper Bot commented May 27, 2026

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

@clawsweeper clawsweeper Bot added rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. and removed rating: 🦐 gold shrimp Decent PR readiness signal, but merge confidence is limited. status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action. labels May 27, 2026
@openclaw-barnacle openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 27, 2026
@kesslerio
Copy link
Copy Markdown
Contributor Author

@clawsweeper re-review

Current head 658e605b17fdada9b985d2c3902cd3d1ebe1b980 is merged with current origin/main; the previously red CI is now green, and the PR body CI section has been updated with the merge-refresh validation.

@clawsweeper
Copy link
Copy Markdown
Contributor

clawsweeper Bot commented May 27, 2026

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

@clawsweeper clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 27, 2026
@clawsweeper clawsweeper Bot added rating: 🌊 off-meta tidepool PR readiness rating does not apply to this item. and removed proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. labels May 29, 2026
@kesslerio
Copy link
Copy Markdown
Contributor Author

@clawsweeper re-review

Current head 3f383768675217b456189d972867a1d0bdde7858 is refreshed against current origin/main (7d5d62511f). The previous Dependency Guard failure checked out 53ad531df9, where scripts/github/dependency-guard.mjs did not exist; this head contains that script.

What changed since the last review:

  • merged current origin/main into the PR branch
  • kept the PR diff scoped to the same five media/doc/test files
  • updated the PR body proof section to use the parseable Real behavior proof fields and current head SHA
  • added a public proof gist with current-head direct runtime output plus the redacted live Telegram gateway excerpt: https://gist.github.com/kesslerio/97110dff154bef75ddb80f35c5cdc8e8

Validation run locally:

pnpm exec oxfmt --check --threads=1 docs/nodes/audio.md docs/nodes/media-understanding.md src/media-understanding/apply.test.ts src/media-understanding/runner.cli-audio.test.ts src/media-understanding/runner.entries.ts
git diff --check origin/main...HEAD
timeout -k5s 180s node scripts/run-vitest.mjs run --config test/vitest/vitest.media-understanding.config.ts --reporter=verbose src/media-understanding/runner.cli-audio.test.ts -t "parakeet|whisper-cli inferred|inferred whisper transcript|local whisper|node wrapper"
timeout -k5s 180s node scripts/run-vitest.mjs run --config test/vitest/vitest.media-understanding.config.ts --reporter=verbose src/media-understanding/runner.cli-audio.test.ts src/media-understanding/apply.test.ts

Results:

  • formatting passed
  • diff whitespace check passed
  • focused parakeet/Whisper/local-wrapper tests passed: 9 passed, 4 skipped
  • broader media-understanding tests passed: 65 passed
  • current-head direct proof passed: runCliEntry returned null, so wrapper progress stdout was not exposed as transcript text

One validation limitation: the broad core tsgo command now fails on unrelated current-main declaration issues for proper-lockfile, cross-spawn, and hosted-git-info; no media-understanding files are involved in that failure.

@clawsweeper
Copy link
Copy Markdown
Contributor

clawsweeper Bot commented Jun 2, 2026

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

@clawsweeper clawsweeper Bot added proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. and removed rating: 🌊 off-meta tidepool PR readiness rating does not apply to this item. labels Jun 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs Improvements or additions to documentation merge-risk: 🚨 compatibility 🚨 May break existing users, config, migrations, defaults, or upgrade paths. P2 Normal backlog priority with limited blast radius. proof: sufficient ClawSweeper judged the real behavior proof convincing. proof: supplied External PR includes structured after-fix real behavior proof. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. size: M status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: CLI audio transcription can use progress stdout when transcript file is empty

2 participants