[voice] Add dedicated STT latency and TTS generation speed benchmarks by Copilot · Pull Request #3442 · makr-code/ThemisDB

Copilot · 2026-03-01T17:55:21Z

The bench_voice_assistant.cpp had a Stubs: 1 annotation and no isolated benchmarks for STT or TTS performance — only full-pipeline (processVoiceCommand) measurements that bundle STT + LLM + TTS together, making it impossible to attribute latency regressions.

Changes

`benchmarks/bench_voice_assistant.cpp`

STT Latency Benchmarks — exercise STTProcessor::transcribe() directly:

BM_STTLatency_Short — 1 s fixed baseline
BM_STTLatency_ByDuration — 0.5 / 1 / 5 / 30 / 60 s inputs (latency-vs-duration scaling)
BM_STTLatency_WithDiarization — speaker diarization path (1 / 5 / 30 s)
BM_STTLatency_Streaming — streamTranscribe() callback path (1 / 5 / 30 s)
BM_STTStatistics — getStatistics() overhead

TTS Generation Speed Benchmarks — exercise TTSProcessor::synthesize() directly:

BM_TTSGenSpeed_Short — short phrase baseline
BM_TTSGenSpeed_ByLength — 50 / 200 / 500 / 2000 char texts
BM_TTSGenSpeed_WithOptions — speed variants 0.5× / 1.0× / 1.5× / 2.0×
BM_TTSGenSpeed_Streaming — streamSynthesize() callback throughput
BM_TTSAvailableVoices / BM_TTSStatistics — auxiliary overhead

Helpers added:

buildWavBlob(duration_ms) — generates a standards-compliant 16-bit PCM WAV of arbitrary duration (noise, deterministic seed) for use as STT input without requiring a real audio file
getSTTProcessor() / getTTSProcessor() — lazy-initialised singletons; initialization cost paid once per binary run, not per benchmark iteration

Each benchmark emits items_processed, bytes_processed (where relevant), and domain counters (audio_duration_ms, text_chars, speed_x10) for CI regression tracking.

File header updated: Stubs: 0, version 0.0.33, quality score 100.0.

`src/voice/ROADMAP.md`

Performance benchmark checklist item updated from [I] → [P].

Type of Change

Testing

Unit tests added/updated
Integration tests added/updated
Manual testing performed

📚 Research & Knowledge (wenn applicable)

Diese PR basiert auf wissenschaftlichen Paper(s) oder Best Practices?
- Falls JA: Research-Dateien in /docs/research/ angelegt?
- Falls JA: Im Modul-README unter "Wissenschaftliche Grundlagen" verlinkt?
- Falls JA: In /docs/research/implementation_influence/ eingetragen?

Relevante Quellen:

Paper:
Best Practice:
Architecture Decision:

Checklist

Code follows project style guidelines
Self-review completed
Documentation updated (if needed)
No new warnings introduced

Original prompt

This section details on the original issue you should resolve

<issue_title>[voice] Improve STT latency and TTS gen speed benchmarks</issue_title>
<issue_description># [voice] Improve STT latency and TTS gen speed benchmarks

Summary

Module: voice
Item: Performance benchmarks (STT latency, TTS generation speed)

Implementation Context

Roadmap section: Production Readiness Checklist
Enhancement hint: Line 14: Open Issues: TODOs: 0, Stubs: 1

Implementation Phases

Phase 1: Voice Pipeline & Session Management (Status: Completed )

VoiceAssistant central coordinator for all voice interaction
STT processing via Whisper AI (speaker diarization, timestamps)
LLM integration via EmbeddedLLM / LlamaWrapper (intent recognition, query generation, response generation)
TTS synthesis with audio format output
Voice command processing pipeline (audio STT LLM TTS audio)
Session state and conversation history management
Phase 2: Streaming STT & Wake-Word Detection (Status: In Progress )
Real-time streaming STT (word-by-word transcription as audio arrives)
Wake-word detection for hands-free activation
Multi-speaker diarization improvements
Phase 3: Voice Macros & Browser Streaming (Status: Planned )
Voice command macros (user-defined shortcuts to AQL queries)
Language detection and automatic locale switching
Noise suppression preprocessing (RNNoise integration)
WebSocket audio streaming endpoint for browser clients
Voice session playback and search in stored transcripts
Phase 4: Multi-Language TTS & Biometric Authentication (Status: Planned )
Multi-language TTS (German, French, Spanish voices)
Emotion / sentiment detection from voice tone
Voice biometric authentication (speaker verification)
Real-time meeting transcription with action-item extraction
Integration with telephony systems (SIP / WebRTC)

Related Work

No direct overlap identified in pre-check.
Relationship is for coordination only; do not expand this issue scope.

Scope Boundary

In scope: implement only this roadmap item: "Performance benchmarks (STT latency, TTS generation speed)"
Out of scope: any other roadmap checklist item, even from the same module/section
Related issues/PRs are references only; implementation remains isolated to this issue
If additional work is required, open/link a follow-up issue instead of extending this issue

Mandatory Delivery Workflow (ThemisDB Rules)

Implementation Tasks

Fixes [voice] Improve STT latency and TTS gen speed benchmarks #2357

💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Co-authored-by: makr-code <150588092+makr-code@users.noreply.github.com>

Initial plan

343058e

Copilot AI assigned Copilot and makr-code Mar 1, 2026

Copilot started work on behalf of makr-code March 1, 2026 18:02 View session

feat(voice): add STT latency and TTS generation speed benchmarks

e0a7ce2

Co-authored-by: makr-code <150588092+makr-code@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Improve STT latency and TTS generation speed benchmarks~~ [voice] Add dedicated STT latency and TTS generation speed benchmarks Mar 1, 2026

Copilot finished work on behalf of makr-code March 1, 2026 18:17

makr-code marked this pull request as ready for review March 1, 2026 18:22

makr-code merged commit 6c09d5e into develop Mar 1, 2026
11 checks passed

makr-code modified the milestones: v1.0.2, v1.5.0 Mar 11, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[voice] Add dedicated STT latency and TTS generation speed benchmarks#3442

[voice] Add dedicated STT latency and TTS generation speed benchmarks#3442
makr-code merged 2 commits intodevelopfrom
copilot/improve-stt-latency-tts-speed

Copilot AI commented Mar 1, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Copilot AI commented Mar 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

benchmarks/bench_voice_assistant.cpp

src/voice/ROADMAP.md

Type of Change

Testing

📚 Research & Knowledge (wenn applicable)

Checklist

Summary

Implementation Context

Implementation Phases

Related Work

Scope Boundary

Mandatory Delivery Workflow (ThemisDB Rules)

Implementation Tasks

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Mar 1, 2026 •

edited

Loading

`benchmarks/bench_voice_assistant.cpp`

`src/voice/ROADMAP.md`