Make local STT a streaming runtime service by qyinm · Pull Request #11 · qyinm/MirrorNote

qyinm · 2026-05-03T14:26:00Z

Summary

Moves local STT behind a warm streaming runtime service with a default worker-process boundary, startup/model-change warmup, and event-driven segment output.

Why

Long recordings should not be transcribed as one large batch. The STT runtime needs to stay warm, reuse model/VAD context, process streamed audio by speech segment, and emit progress/final events continuously.

Changes

Adds process-worker STT runtime boundary with inline fallback via MIRRORNOTE_STT_DISABLE_PROCESS_WORKER=1.
Warms Whisper and Silero ONNX VAD contexts on app startup, model download, and settings changes.
Streams PCM chunks through VAD speech segmentation and emits final segment, progress, cancel, error, and session-finished events.
Keeps Electron focused on orchestration: capture/file IO, PCM streaming, and transcript/UI updates.
Adds backend/worker contracts for future MLX Whisper, faster-whisper, remote STT, or diarization backends.
Verified with Rust tests, desktop checks, Metal process-worker smoke, and a 1-hour silence streaming stress test.

Summary by CodeRabbit

New Features
- Real-time partial and final STT segments appear live during transcription and are written to the transcript file as they arrive.
- Transcription now reports progress and session state updates during processing.
Bug Fixes / Reliability
- More robust streaming transcription with improved error handling and fallback finalization.
- Proactive STT warmup for faster availability on startup and after settings changes.

coderabbitai · 2026-05-03T14:26:08Z

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Free

Run ID: 366d84dd-6b62-418d-ac79-5512beb184d2

📥 Commits

Reviewing files that changed from the base of the PR and between cf450b4 and 71cee3f.

📒 Files selected for processing (1)

apps/desktop/main.cjs

📝 Walkthrough

Walkthrough

The PR converts local STT from one-shot file transcription to a streaming session model: desktop app streams PCM chunks, receives incremental segment/progress events, and writes live transcripts; the Rust STT server is refactored into an event-driven runtime with worker abstractions, streaming job queues, and session lifecycle/events. warmLocalSTTServer is added.

Changes

Desktop app: streaming integration & wiring

Layer / File(s)	Summary
Config / Constants `apps/desktop/main.cjs`	Adds `LOCAL_STT_STREAM_CHUNK_SECONDS = 5` to control streaming chunk duration.
API Surface `apps/desktop/main.cjs` (lines 3174–3180)	`transcribeRecordingFile(recordingPath, transcriptionConfiguration)` now accepts an `options = {}` object and forwards it to the local STT caller.
Streaming Flow (core) `apps/desktop/main.cjs` (lines 3225–3297, 3299–3332)	Replaces one-shot transcription with session-based flow: start_session, push_audio_chunk in PCM chunks, subscribe to session events, accumulate fallback segments, finalize/cancel session, and return normalized segments preferring streamed events.
Live transcript persistence `apps/desktop/main.cjs` (lines 2394–2409, 2402–2409)	`retryTranscript` and new `writeLiveTranscript(paths, segments)` flush live segments to JSONL and update `metadata.transcriptSegmentCount` on each flush.
Event mapping `apps/desktop/main.cjs` (lines 3300–3332)	`captureEventFromLocalSTTEvent` maps local STT events (`partial_segment`, `final_segment`, `progress`, `error`, `session_finished`) into app capture events for broadcasting and UI updates.
PCM streaming helpers `apps/desktop/main.cjs` (lines 3334–3432)	Adds `readLocalSTTPcmChunks` and `readPcm16WavHeader` to read PCM16 WAV and yield mono float PCM chunks for streaming.
Server bridge events `apps/desktop/main.cjs` (lines 3472–3571)	`LocalSTTServerBridge` gains `eventHandlers`, `emitEvent`, and `onEvent`; stdout parser now dispatches `response.event` frames to registered handlers.
Warmup / Initialization `apps/desktop/main.cjs` (lines 3630–3644, 1214–1216, 3973–3977, 4016–4018)	Adds `warmLocalSTTServer()` and invokes it after model download, after settings:update, and on app startup (no-op in test mode).

Rust STT server: event-driven runtime, workers, and streaming jobs

Layer / File(s)	Summary
I/O framing & plumbing `native/stt-server-rust/src/main.rs` (lines 8–25, 2983–3106)	Adds framed JSON request/response `ResponseSink` and `RequestSource` abstractions, spawns a stdio request reader thread, and replaces main with `run_stdio_event_loop` that decouples request reception from periodic event draining.
Wire/domain changes `native/stt-server-rust/src/main.rs` (lines 83–138, 199–287)	`ResponseEnvelope` now can carry an optional `SttEvent`; introduces `SttEvent`, `SttEventType`, and `SttProgress` for partial/final segments, progress, errors, cancellation, and sessionFinished events.
Session & server state `native/stt-server-rust/src/main.rs` (lines 288–440)	Replaces prior runtime state with `RuntimeSession` and streaming job/session-centric state: queued samples, in-flight jobs, per-session progress and helpers for session matching.
Runtime service abstraction `native/stt-server-rust/src/main.rs` (lines 441–509)	Introduces `SttRuntimeService` and two implementations: `LocalSttRuntimeService` (inline worker) and `ProcessSttRuntimeService` (child worker process) to separate request handling from event draining.
Backend & worker layer `native/stt-server-rust/src/main.rs` (lines 867–1739)	Adds `LocalSttBackend` trait and `WhisperCppOnnxVadBackend` implementation, `SttWorker` trait, `BackgroundSttWorker`/`InlineSttWorker` and `ProcessSttWorker` implementations; worker layer executes transcription jobs (file & chunk) and returns results/events.
Streaming helpers & normalization `native/stt-server-rust/src/main.rs` (lines 2364–2480, 2584–2592)	Adds `normalize_pcm_samples_for_whisper(...)` and streaming helpers: `push_streaming_audio_chunk`, `submit_queued_streaming_jobs`, `collect_completed_streaming_jobs`, and `flush_streaming_session` for buffering, resampling, job submission, and finalization.
Request routing & event dispatch `native/stt-server-rust/src/main.rs` (lines 563–745, 2697–2967)	`handle_request` refactored to route commands through worker APIs, update `RuntimeSession`, and emit segment/progress/lifecycle events via centralized `response_events` dispatch (including tail `SessionFinished` events).
Tests `native/stt-server-rust/src/main.rs` (lines 3116–3826)	Extensive tests added/updated for framed I/O, PCM normalization/resampling, streaming buffering/job lifecycle, event ordering (event frames before final responses), finalization, cancellation, and worker contracts.

Sequence Diagram

sequenceDiagram
    participant App as Desktop App
    participant STTServer as STT Server
    participant Worker as Backend Worker
    rect rgba(200,200,255,0.5)
    App->>STTServer: start_session
    activate STTServer
    STTServer->>STTServer: create RuntimeSession
    end
    loop PCM chunk loop
        rect rgba(200,255,200,0.5)
        App->>STTServer: push_audio_chunk (PCM)
        STTServer->>STTServer: buffer & normalize
        STTServer->>Worker: enqueue/submit job
        activate Worker
        Worker-->>STTServer: job result (segments)
        deactivate Worker
        STTServer-->>App: SttEvent(partial_segment / progress)
        App->>App: writeLiveTranscript
        end
    end
    App->>STTServer: finalize_session
    STTServer->>Worker: flush remaining jobs
    activate Worker
    Worker-->>STTServer: final segments
    deactivate Worker
    STTServer-->>App: SttEvent(final_segment)
    STTServer-->>App: SttEvent(session_finished)
    deactivate STTServer

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Poem

🐰
I nibble bytes of spoken streams,
Chunks stitched soft like carrot dreams.
Segments hop in tidy rows,
Live the transcript garden grows—
Hooray for sessions, live and green!

Note

🎁 Summarized by CodeRabbit Free

Your organization is on the Free plan. CodeRabbit will generate a high-level summary and a walkthrough for each pull request. For a comprehensive line-by-line review, please upgrade your subscription to CodeRabbit Pro by visiting https://app.coderabbit.ai/login.

_{Review rate limit: 1/3 review remaining, refill in 23 minutes and 24 seconds.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request refactors the local Speech-to-Text (STT) system to support streaming audio processing and live transcript updates. It introduces a worker process model for the STT server and implements session-based chunking for audio data. Feedback includes optimizing memory allocation for WAV header parsing and refining the PCM normalization calculation to ensure audio samples are correctly scaled.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: cf450b4e61

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

qyinm added 30 commits May 3, 2026 21:59

Separate STT runtime service from stdio transport

ed66f88

Transcribe PCM chunks without temp WAV files

7e6ce06

Emit local STT runtime event frames

db31816

Add streaming audio chunk command

7a778cc

Keep STT session configuration in runtime state

3ce4df0

Buffer streaming STT audio in runtime sessions

ca106da

Stream local STT audio from Electron

efebfe5

Emit final STT segment events on session flush

3df5990

Forward STT runtime events to capture listeners

8ad434c

Persist live STT transcript segments

5fee331

Wrap whisper STT behind backend interface

4beb45f

Queue streaming STT transcription jobs

b66cebc

Report streaming STT progress by audio time

74bc522

Prepare STT response sink for async events

b985f8d

Make streaming STT jobs self-contained

682537f

Isolate streaming STT job execution

ba26181

Separate STT backend from runtime state

70cfa6e

Isolate STT request source

54938b4

Introduce STT worker boundary

d00f139

Route STT worker through owned commands

5da8be2

Run STT backend on a worker thread

7dc8b3f

Submit streaming STT jobs asynchronously

2bc12c0

Drain streaming STT events through runtime service

8a0f58f

Pump STT runtime events from stdio loop

5b636e9

Drop stale STT job results after cancel

9f52955

Verify async STT job submission

c679e7e

Keep Metal STT backend on runtime thread

9d86889

Add opt-in STT worker process boundary

f0643a1

Enable STT worker process by default

8c284bb

Warm local STT runtime on startup

112c7c8

qyinm added 2 commits May 3, 2026 23:20

Emit final events from STT worker results

7972ffe

Refresh STT warmup after model changes

cf450b4

gemini-code-assist Bot reviewed May 3, 2026

View reviewed changes

Comment thread apps/desktop/main.cjs Outdated

Comment thread apps/desktop/main.cjs Outdated

chatgpt-codex-connector Bot reviewed May 3, 2026

View reviewed changes

Comment thread apps/desktop/main.cjs Outdated

Comment thread apps/desktop/main.cjs Outdated

qyinm added 3 commits May 3, 2026 23:41

Ignore STT events outside active session

253df8f

Clamp PCM16 normalization range

961e88c

Scan WAV chunks without header cap

71cee3f

qyinm merged commit fc0daef into main May 3, 2026
2 checks passed

qyinm deleted the qyinm-stt-runtime-service-boundary branch May 3, 2026 14:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make local STT a streaming runtime service#11

Make local STT a streaming runtime service#11
qyinm merged 35 commits into
mainfrom
qyinm-stt-runtime-service-boundary

qyinm commented May 3, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 3, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Poem

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

qyinm commented May 3, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why

Changes

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Poem

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

qyinm commented May 3, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 3, 2026 •

edited

Loading