Skip to content

Make local STT a streaming runtime service#11

Merged
qyinm merged 35 commits into
mainfrom
qyinm-stt-runtime-service-boundary
May 3, 2026
Merged

Make local STT a streaming runtime service#11
qyinm merged 35 commits into
mainfrom
qyinm-stt-runtime-service-boundary

Conversation

@qyinm
Copy link
Copy Markdown
Owner

@qyinm qyinm commented May 3, 2026

Summary

Moves local STT behind a warm streaming runtime service with a default worker-process boundary, startup/model-change warmup, and event-driven segment output.

Why

Long recordings should not be transcribed as one large batch. The STT runtime needs to stay warm, reuse model/VAD context, process streamed audio by speech segment, and emit progress/final events continuously.

Changes

  • Adds process-worker STT runtime boundary with inline fallback via MIRRORNOTE_STT_DISABLE_PROCESS_WORKER=1.
  • Warms Whisper and Silero ONNX VAD contexts on app startup, model download, and settings changes.
  • Streams PCM chunks through VAD speech segmentation and emits final segment, progress, cancel, error, and session-finished events.
  • Keeps Electron focused on orchestration: capture/file IO, PCM streaming, and transcript/UI updates.
  • Adds backend/worker contracts for future MLX Whisper, faster-whisper, remote STT, or diarization backends.
  • Verified with Rust tests, desktop checks, Metal process-worker smoke, and a 1-hour silence streaming stress test.

Summary by CodeRabbit

  • New Features

    • Real-time partial and final STT segments appear live during transcription and are written to the transcript file as they arrive.
    • Transcription now reports progress and session state updates during processing.
  • Bug Fixes / Reliability

    • More robust streaming transcription with improved error handling and fallback finalization.
    • Proactive STT warmup for faster availability on startup and after settings changes.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 3, 2026

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Free

Run ID: 366d84dd-6b62-418d-ac79-5512beb184d2

📥 Commits

Reviewing files that changed from the base of the PR and between cf450b4 and 71cee3f.

📒 Files selected for processing (1)
  • apps/desktop/main.cjs

📝 Walkthrough

Walkthrough

The PR converts local STT from one-shot file transcription to a streaming session model: desktop app streams PCM chunks, receives incremental segment/progress events, and writes live transcripts; the Rust STT server is refactored into an event-driven runtime with worker abstractions, streaming job queues, and session lifecycle/events. warmLocalSTTServer is added.

Changes

Desktop app: streaming integration & wiring

Layer / File(s) Summary
Config / Constants
apps/desktop/main.cjs
Adds LOCAL_STT_STREAM_CHUNK_SECONDS = 5 to control streaming chunk duration.
API Surface
apps/desktop/main.cjs (lines 3174–3180)
transcribeRecordingFile(recordingPath, transcriptionConfiguration) now accepts an options = {} object and forwards it to the local STT caller.
Streaming Flow (core)
apps/desktop/main.cjs (lines 3225–3297, 3299–3332)
Replaces one-shot transcription with session-based flow: start_session, push_audio_chunk in PCM chunks, subscribe to session events, accumulate fallback segments, finalize/cancel session, and return normalized segments preferring streamed events.
Live transcript persistence
apps/desktop/main.cjs (lines 2394–2409, 2402–2409)
retryTranscript and new writeLiveTranscript(paths, segments) flush live segments to JSONL and update metadata.transcriptSegmentCount on each flush.
Event mapping
apps/desktop/main.cjs (lines 3300–3332)
captureEventFromLocalSTTEvent maps local STT events (partial_segment, final_segment, progress, error, session_finished) into app capture events for broadcasting and UI updates.
PCM streaming helpers
apps/desktop/main.cjs (lines 3334–3432)
Adds readLocalSTTPcmChunks and readPcm16WavHeader to read PCM16 WAV and yield mono float PCM chunks for streaming.
Server bridge events
apps/desktop/main.cjs (lines 3472–3571)
LocalSTTServerBridge gains eventHandlers, emitEvent, and onEvent; stdout parser now dispatches response.event frames to registered handlers.
Warmup / Initialization
apps/desktop/main.cjs (lines 3630–3644, 1214–1216, 3973–3977, 4016–4018)
Adds warmLocalSTTServer() and invokes it after model download, after settings:update, and on app startup (no-op in test mode).

Rust STT server: event-driven runtime, workers, and streaming jobs

Layer / File(s) Summary
I/O framing & plumbing
native/stt-server-rust/src/main.rs (lines 8–25, 2983–3106)
Adds framed JSON request/response ResponseSink and RequestSource abstractions, spawns a stdio request reader thread, and replaces main with run_stdio_event_loop that decouples request reception from periodic event draining.
Wire/domain changes
native/stt-server-rust/src/main.rs (lines 83–138, 199–287)
ResponseEnvelope now can carry an optional SttEvent; introduces SttEvent, SttEventType, and SttProgress for partial/final segments, progress, errors, cancellation, and sessionFinished events.
Session & server state
native/stt-server-rust/src/main.rs (lines 288–440)
Replaces prior runtime state with RuntimeSession and streaming job/session-centric state: queued samples, in-flight jobs, per-session progress and helpers for session matching.
Runtime service abstraction
native/stt-server-rust/src/main.rs (lines 441–509)
Introduces SttRuntimeService and two implementations: LocalSttRuntimeService (inline worker) and ProcessSttRuntimeService (child worker process) to separate request handling from event draining.
Backend & worker layer
native/stt-server-rust/src/main.rs (lines 867–1739)
Adds LocalSttBackend trait and WhisperCppOnnxVadBackend implementation, SttWorker trait, BackgroundSttWorker/InlineSttWorker and ProcessSttWorker implementations; worker layer executes transcription jobs (file & chunk) and returns results/events.
Streaming helpers & normalization
native/stt-server-rust/src/main.rs (lines 2364–2480, 2584–2592)
Adds normalize_pcm_samples_for_whisper(...) and streaming helpers: push_streaming_audio_chunk, submit_queued_streaming_jobs, collect_completed_streaming_jobs, and flush_streaming_session for buffering, resampling, job submission, and finalization.
Request routing & event dispatch
native/stt-server-rust/src/main.rs (lines 563–745, 2697–2967)
handle_request refactored to route commands through worker APIs, update RuntimeSession, and emit segment/progress/lifecycle events via centralized response_events dispatch (including tail SessionFinished events).
Tests
native/stt-server-rust/src/main.rs (lines 3116–3826)
Extensive tests added/updated for framed I/O, PCM normalization/resampling, streaming buffering/job lifecycle, event ordering (event frames before final responses), finalization, cancellation, and worker contracts.

Sequence Diagram

sequenceDiagram
    participant App as Desktop App
    participant STTServer as STT Server
    participant Worker as Backend Worker
    rect rgba(200,200,255,0.5)
    App->>STTServer: start_session
    activate STTServer
    STTServer->>STTServer: create RuntimeSession
    end
    loop PCM chunk loop
        rect rgba(200,255,200,0.5)
        App->>STTServer: push_audio_chunk (PCM)
        STTServer->>STTServer: buffer & normalize
        STTServer->>Worker: enqueue/submit job
        activate Worker
        Worker-->>STTServer: job result (segments)
        deactivate Worker
        STTServer-->>App: SttEvent(partial_segment / progress)
        App->>App: writeLiveTranscript
        end
    end
    App->>STTServer: finalize_session
    STTServer->>Worker: flush remaining jobs
    activate Worker
    Worker-->>STTServer: final segments
    deactivate Worker
    STTServer-->>App: SttEvent(final_segment)
    STTServer-->>App: SttEvent(session_finished)
    deactivate STTServer
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Poem

🐰
I nibble bytes of spoken streams,
Chunks stitched soft like carrot dreams.
Segments hop in tidy rows,
Live the transcript garden grows—
Hooray for sessions, live and green!


Note

🎁 Summarized by CodeRabbit Free

Your organization is on the Free plan. CodeRabbit will generate a high-level summary and a walkthrough for each pull request. For a comprehensive line-by-line review, please upgrade your subscription to CodeRabbit Pro by visiting https://app.coderabbit.ai/login.

Review rate limit: 1/3 review remaining, refill in 23 minutes and 24 seconds.

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the local Speech-to-Text (STT) system to support streaming audio processing and live transcript updates. It introduces a worker process model for the STT server and implements session-based chunking for audio data. Feedback includes optimizing memory allocation for WAV header parsing and refining the PCM normalization calculation to ensure audio samples are correctly scaled.

Comment thread apps/desktop/main.cjs Outdated
Comment thread apps/desktop/main.cjs Outdated
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: cf450b4e61

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread apps/desktop/main.cjs Outdated
Comment thread apps/desktop/main.cjs Outdated
@qyinm qyinm merged commit fc0daef into main May 3, 2026
2 checks passed
@qyinm qyinm deleted the qyinm-stt-runtime-service-boundary branch May 3, 2026 14:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant