[voice] Add dedicated STT latency and TTS generation speed benchmarks#3442
Merged
[voice] Add dedicated STT latency and TTS generation speed benchmarks#3442
Conversation
Co-authored-by: makr-code <150588092+makr-code@users.noreply.github.com>
Copilot
AI
changed the title
[WIP] Improve STT latency and TTS generation speed benchmarks
[voice] Add dedicated STT latency and TTS generation speed benchmarks
Mar 1, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The
bench_voice_assistant.cpphad aStubs: 1annotation and no isolated benchmarks for STT or TTS performance — only full-pipeline (processVoiceCommand) measurements that bundle STT + LLM + TTS together, making it impossible to attribute latency regressions.Changes
benchmarks/bench_voice_assistant.cppSTT Latency Benchmarks — exercise
STTProcessor::transcribe()directly:BM_STTLatency_Short— 1 s fixed baselineBM_STTLatency_ByDuration— 0.5 / 1 / 5 / 30 / 60 s inputs (latency-vs-duration scaling)BM_STTLatency_WithDiarization— speaker diarization path (1 / 5 / 30 s)BM_STTLatency_Streaming—streamTranscribe()callback path (1 / 5 / 30 s)BM_STTStatistics—getStatistics()overheadTTS Generation Speed Benchmarks — exercise
TTSProcessor::synthesize()directly:BM_TTSGenSpeed_Short— short phrase baselineBM_TTSGenSpeed_ByLength— 50 / 200 / 500 / 2000 char textsBM_TTSGenSpeed_WithOptions— speed variants 0.5× / 1.0× / 1.5× / 2.0×BM_TTSGenSpeed_Streaming—streamSynthesize()callback throughputBM_TTSAvailableVoices/BM_TTSStatistics— auxiliary overheadHelpers added:
buildWavBlob(duration_ms)— generates a standards-compliant 16-bit PCM WAV of arbitrary duration (noise, deterministic seed) for use as STT input without requiring a real audio filegetSTTProcessor()/getTTSProcessor()— lazy-initialised singletons; initialization cost paid once per binary run, not per benchmark iterationEach benchmark emits
items_processed,bytes_processed(where relevant), and domain counters (audio_duration_ms,text_chars,speed_x10) for CI regression tracking.File header updated:
Stubs: 0, version0.0.33, quality score100.0.src/voice/ROADMAP.mdPerformance benchmark checklist item updated from
[I]→[P].Type of Change
Testing
📚 Research & Knowledge (wenn applicable)
/docs/research/angelegt?/docs/research/implementation_influence/eingetragen?Relevante Quellen:
Checklist
Original prompt
This section details on the original issue you should resolve
<issue_title>[voice] Improve STT latency and TTS gen speed benchmarks</issue_title>
<issue_description># [voice] Improve STT latency and TTS gen speed benchmarks
Summary
Implementation Context
Open Issues: TODOs: 0, Stubs: 1Implementation Phases
Phase 1: Voice Pipeline & Session Management (Status: Completed )
VoiceAssistantcentral coordinator for all voice interactionEmbeddedLLM/LlamaWrapper(intent recognition, query generation, response generation)Phase 2: Streaming STT & Wake-Word Detection (Status: In Progress )
Phase 3: Voice Macros & Browser Streaming (Status: Planned )
Phase 4: Multi-Language TTS & Biometric Authentication (Status: Planned )
Related Work
Scope Boundary
Mandatory Delivery Workflow (ThemisDB Rules)
Implementation Tasks
VoiceAssistantcentral coordinator for all voice interactionEmbeddedLLM/LlamaWrapper(intent recognition, query generation, response generation)💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.