Pseudo-streaming:
- Hysteresis silence detection: LOW (-40 dB) inside an utterance,
HIGH (-25 dB) to start one. Avoids ambient noise opening spurious
chunks and trailing-syllable clipping at chunk ends.
- Commit on every detected silence pause; session audio queue is
preserved across cuts.
- 1.5 s minimum chunk duration suppresses whisper hallucinations
on sub-second clips ("Thanks for watching!" etc).
- Cross-chunk prompt context: each chunk's transcription feeds the
next chunk's initial_prompt for capitalization / article gender /
language stability. Dropped after pauses >1.5 s so a bad chunk
can't poison the rest of the recording.
whisper-futo:
- Text-layer non-speech filter ("(music)", "[Applause]", etc).
- max_tokens cap bounds decoder repetition loops on short chunks.
- initial_prompt and --words now wired through.
- Reverted the q8_0 turbo model (large-v3 encoder incompatible
with ACFT audio_ctx shrinkage).
OpenAI realtime (gpt-realtime-whisper):
- Stop sending the prompt field — the model rejects it
server-side. Kwarg stays accepted for plumbing compatibility.
- Coalesce per-token deltas before yielding so paste_via_clipboard
on Wayland doesn't race itself. ~400 ms cadence or sentence-final
punctuation; 200 ms floor between any two flushes.
- Bypass coalescing entirely under --type-direct (no clipboard, no
race to defeat).
Session loop:
- Don't drop queued audio after a silence-cut: slow backends were
losing the user's next words during the round-trip.
Documentation:
- docs/backends.md vocabulary table now lists whisper-futo and
correctly describes realtime's prompt handling.
- docs/backends.md pseudo-streaming section documents the
cross-chunk context behaviour and pause-based reset.
- docs/keyboard.md final note explains realtime delta coalescing
and why --type-direct bypasses it.