Make local STT a streaming runtime service#11
Conversation
ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Free Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughThe PR converts local STT from one-shot file transcription to a streaming session model: desktop app streams PCM chunks, receives incremental segment/progress events, and writes live transcripts; the Rust STT server is refactored into an event-driven runtime with worker abstractions, streaming job queues, and session lifecycle/events. warmLocalSTTServer is added. ChangesDesktop app: streaming integration & wiring
Rust STT server: event-driven runtime, workers, and streaming jobs
Sequence DiagramsequenceDiagram
participant App as Desktop App
participant STTServer as STT Server
participant Worker as Backend Worker
rect rgba(200,200,255,0.5)
App->>STTServer: start_session
activate STTServer
STTServer->>STTServer: create RuntimeSession
end
loop PCM chunk loop
rect rgba(200,255,200,0.5)
App->>STTServer: push_audio_chunk (PCM)
STTServer->>STTServer: buffer & normalize
STTServer->>Worker: enqueue/submit job
activate Worker
Worker-->>STTServer: job result (segments)
deactivate Worker
STTServer-->>App: SttEvent(partial_segment / progress)
App->>App: writeLiveTranscript
end
end
App->>STTServer: finalize_session
STTServer->>Worker: flush remaining jobs
activate Worker
Worker-->>STTServer: final segments
deactivate Worker
STTServer-->>App: SttEvent(final_segment)
STTServer-->>App: SttEvent(session_finished)
deactivate STTServer
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Poem
Note 🎁 Summarized by CodeRabbit FreeYour organization is on the Free plan. CodeRabbit will generate a high-level summary and a walkthrough for each pull request. For a comprehensive line-by-line review, please upgrade your subscription to CodeRabbit Pro by visiting https://app.coderabbit.ai/login. Review rate limit: 1/3 review remaining, refill in 23 minutes and 24 seconds. Comment |
There was a problem hiding this comment.
Code Review
This pull request refactors the local Speech-to-Text (STT) system to support streaming audio processing and live transcript updates. It introduces a worker process model for the STT server and implements session-based chunking for audio data. Feedback includes optimizing memory allocation for WAV header parsing and refining the PCM normalization calculation to ensure audio samples are correctly scaled.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: cf450b4e61
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Summary
Moves local STT behind a warm streaming runtime service with a default worker-process boundary, startup/model-change warmup, and event-driven segment output.
Why
Long recordings should not be transcribed as one large batch. The STT runtime needs to stay warm, reuse model/VAD context, process streamed audio by speech segment, and emit progress/final events continuously.
Changes
MIRRORNOTE_STT_DISABLE_PROCESS_WORKER=1.Summary by CodeRabbit
New Features
Bug Fixes / Reliability