feat(telegram): Voice pipeline refactor with STT integration and configurable routing by maemreyo · Pull Request #33 · nextlevelbuilder/goclaw

maemreyo · 2026-03-01T13:38:13Z

🎯 Overview

This PR introduces a comprehensive refactor of the Telegram voice pipeline, improving code organization, testability, and adding robust voice agent routing capabilities with STT (Speech-to-Text) integration.

✨ Key Features

1. Nested Voice Configuration Structure

Introduced TelegramVoiceConfig to group all voice-related settings under a single voice JSON key
Clear separation between base channel settings and voice pipeline configuration
Backward compatible with legacy flat config layout via automatic promotion

2. Voice Agent Routing System

Configurable voice agent routing with priority-based decision chain:
1. Audio/Voice Media → Always routes to voice agent (highest priority)
2. /start Command → Bootstraps voice session with customizable message
3. Intent Keywords → Text-based routing via configurable keyword matching
4. Session Affinity → Sticky routing with TTL-based expiration
5. Affinity Clear Keywords → User-initiated switch back to default agent
DM-only routing logic (groups excluded except for audio media)
Case-insensitive keyword matching with defensive normalization

3. STT (Speech-to-Text) Integration

Multipart form-data contract with /transcribe_audio endpoint
Bearer token authentication support
Tenant ID forwarding for multi-tenant deployments
Configurable timeout (default: 30s)
Concurrency control via buffered-channel semaphore (max 4 concurrent calls per channel)
Shared HTTP client with connection pooling for performance

4. Audio Guard System

Extracted into dedicated voiceguard package for better separation of concerns
Zero dependencies on Telegram SDK or message bus
Pure string→string transformation for easy unit testing
Intercepts technical error language in voice agent replies
User-friendly fallback messages with transcript support
Customizable error markers (replaces built-in defaults when set)
Supports both English and Vietnamese error detection

5. Enhanced Testability

resolveTargetAgent() extracted as pure function (no I/O side effects)
14 table-driven test cases covering all routing scenarios
Race condition testing with -race flag support
13 unit tests for audio guard logic
Comprehensive STT test coverage

6. Agent Loop Improvements

Rate limit model fallback support
ForwardMedia field for delegation artifact forwarding
Improved error handling and tracing

📊 Changes Summary

18 files changed
+2020 insertions
-247 deletions

New Files

internal/channels/telegram/voiceguard/guard.go - Audio guard logic
internal/channels/telegram/voiceguard/guard_test.go - Audio guard tests (13 tests)
internal/channels/telegram/handlers_voice_routing_test.go - Voice routing tests (14 tests)
internal/config/config_load_voice_test.go - Voice config tests
internal/agent/loop_fallback_test.go - Model fallback tests
cmd/gateway_consumer_audio_sanitize_test.go - Audio sanitization tests

Modified Files

internal/config/config_channels.go - New TelegramVoiceConfig struct
internal/channels/telegram/factory.go - Legacy config promotion logic
internal/channels/telegram/handlers.go - Voice routing implementation
internal/channels/telegram/stt.go - STT concurrency control & HTTP client pooling
internal/agent/loop.go - ForwardMedia support & improved structure
cmd/gateway_consumer.go - Integration with voiceguard package

🔄 Migration Path

For Existing Deployments

No immediate action required! The refactor is fully backward compatible:

Existing DB rows with flat config layout continue to work
Legacy fields are automatically promoted to nested structure on load
No database migration needed

For New Deployments

Use the nested structure for cleaner configuration:

{
  "voice": {
    "agent_id": "speaking-agent",
    "stt_proxy_url": "https://stt.example.com",
    "stt_api_key": "secret-key",
    "intent_keywords": ["speaking", "pronunciation"],
    "affinity_clear_keywords": ["homework", "payment"],
    "affinity_ttl_minutes": 360,
    "dm_context_template": "Context:\n- tenant: {tenant_id}\n- user_id: {user_id}",
    "audio_guard_fallback_transcript": "🎙️ Got your voice: \"%s\". Please try again!",
    "audio_guard_error_markers": ["system error", "rate limit"]
  }
}

🧪 Testing

All tests pass:

✅ internal/channels/telegram - 14 routing tests
✅ internal/channels/telegram/voiceguard - 13 audio guard tests  
✅ internal/channels/telegram - STT tests updated
✅ cmd - audio sanitization tests

Run with race detector:

go test ./internal/channels/telegram/... -race -v

🔧 Environment Variables

New environment variable support:

GOCLAW_VOICE_AGENT_ID - Override voice agent ID
GOCLAW_STT_TENANT_ID - Override STT tenant ID
GOCLAW_VOICE_DM_CONTEXT_TEMPLATE - Override DM context template
GOCLAW_AUDIO_GUARD_FALLBACK_TRANSCRIPT - Override transcript fallback
GOCLAW_AUDIO_GUARD_FALLBACK_NO_TRANSCRIPT - Override no-transcript fallback

📝 Documentation

Voice Routing Priority Chain

Audio/voice media present → voice agent (applies to groups too)
/start or start text (DM only) → voice agent + rewrite content
Text matches intent keywords (DM only) → voice agent + set affinity
Existing non-expired affinity (DM only) → continue routing to affinity agent
Text matches clear keywords (DM only) → evict affinity, route to default
Fallback → default agent

Audio Guard Behavior

Only triggers for voice agent on Telegram DMs with audio/voice media
Checks reply for technical error language
Replaces with user-friendly fallback when error detected
Supports custom error markers (replaces defaults when set)
Extracts and includes transcript in fallback when available

🐛 Bug Fixes

Fixed group affinity leak (affinity no longer stored for group chats)
Fixed variable assignment in resolveTargetAgent call
Normalized voice routing keywords to lowercase for case-insensitive matching
Fixed STT contract to use audio field (not legacy file field)

🔍 Code Quality

Zero breaking changes for existing deployments
Comprehensive test coverage (27 new tests)
Clear separation of concerns (voiceguard package)
Improved code organization and maintainability
Detailed inline documentation
Performance optimizations (HTTP client pooling, concurrency control)

📚 Related Issues

Closes: (if any issue numbers)

🙏 Acknowledgments

This refactor builds upon the existing voice pipeline foundation and improves it with better structure, testability, and configurability for production deployments.

- Add dmAgentAffinity map for sticky DM routing to voice agent - Add STT config fields (STTProxyURL, STTAPIKey, STTTenantID, STTTimeoutSec, VoiceAgentID) - Implement looksLikeSpeakingIntent and looksLikeNonSpeakingIntent for smart routing - Add session affinity with 6h TTL for DM conversations - Improve STT URL handling with proper trimming - Add logging for transcript attachment

- Add modelFallbacks to Loop config for fallback model support - Implement callProviderWithFallback for automatic model switching on 429 errors - Add modelCandidates helper to deduplicate primary + fallback models - Add isRateLimitFailure detection for 429 status and common rate limit error messages - Update emitLLMSpan to track actual model used in span

- Change form field from 'file' to 'audio' for speaking-service contract - Add default tenant_id fallback ('default') when not configured - Add speaking-agent Telegram audio guard for student replies - Add internal identity prompt for speaking-agent in DM - Add sanitizeSpeakingAudioStudentReply to handle technical errors - Update STT tests for new contract

Replace hardcoded speaking-agent logic with configurable Telegram channel settings: - VoiceStartMessage, VoiceIntentKeywords, VoiceAffinityClearKeywords, VoiceAffinityTTLMinutes - VoiceDMContextTemplate (injects context with {user_id} substitution) - AudioGuardFallbackTranscript/NoTranscript for custom fallback messages - GOCLAW_STT_TENANT_ID and GOCLAW_VOICE_DM_CONTEXT_TEMPLATE env var overrides This allows deployments to customize voice routing behavior and error fallback messages without code changes. Includes new tests for voice routing logic and audio guard sanitization.

- Replace fmt.Sprintf with strings.ReplaceAll in audio fallback template handling to prevent "%!(EXTRA string=...)" garbage when custom templates lack %s placeholder - Lowercase config keywords defensively in matchesVoiceIntent and matchesAffinityClear since inbound text is normalized but DB keywords may have mixed case - Add comprehensive test coverage for custom fallback templates with and without placeholders - Add test cases for mixed-case keyword matching in voice intent and affinity-clear routing - Ensure operators can safely configure keywords with any casing without breaking voice routing logic

- Extract voice agent reply sanitization into new voiceguard package with Guard type - Add voiceguard.SanitizeReply function to handle technical error detection and fallback messaging - Refactor voice configuration from flat fields (VoiceAgentID, VoiceDMContextTemplate) to nested Voice struct - Support both nested and flat JSON layouts in telegramInstanceConfig for backward compatibility - Add sttSem field to Channel for bounding parallel STT HTTP calls - Update gateway_consumer to use voiceguard package instead of inline sanitization logic - Remove sanitizeVoiceAgentReply, containsTechnicalErrorLanguage, and extractTranscriptFromInbound functions from gateway_consumer - Clean up unused imports (html, regexp) from gateway_consumer

- Fix assignment operator from `=` to `:=` in handleMessage for proper variable declaration - Remove duplicate test code block at end of handlers_voice_routing_test.go - Clean up test file structure to eliminate redundant package declaration and imports

Erudition · 2026-03-11T05:06:06Z

Looking forward to this!

… control (#33) Cherry-pick two features from PR #33: - Voiceguard: intercept technical errors (rate limits, tool failures, exit codes) in voice agent replies and replace with user-friendly fallback messages. Configurable error markers and fallback templates via TelegramConfig. - STT: shared HTTP client with connection pooling (sync.Once) and concurrency semaphore (max 4 concurrent calls) to prevent STT proxy overload.

viettranx · 2026-03-20T00:50:32Z

Thank you for the comprehensive voice pipeline work in this PR! The voiceguard error sanitization and STT concurrency improvements have been cherry-picked and landed on main in commit 9f80842.

What was merged:

Voiceguard package — intercepts technical errors in voice agent replies with user-friendly fallbacks
STT shared HTTP client + concurrency semaphore (max 4 concurrent calls)

What was deferred (YAGNI for now):

Nested TelegramVoiceConfig — flat config works fine, no migration needed
Advanced voice routing (intent keywords, session affinity) — basic VoiceAgentID routing handles current needs
Rate limit model fallback — providers already have RetryDo(), this should live at the provider layer

The deferred features can be revisited when there's a concrete need. Thanks again for the thorough implementation and test coverage — it made the cherry-pick straightforward! 🙏

maemreyo · 2026-03-20T04:14:41Z

Thank you for the comprehensive voice pipeline work in this PR! The voiceguard error sanitization and STT concurrency improvements have been cherry-picked and landed on main in commit 9f80842.

What was merged:
* Voiceguard package — intercepts technical errors in voice agent replies with user-friendly fallbacks

* STT shared HTTP client + concurrency semaphore (max 4 concurrent calls)
What was deferred (YAGNI for now):
* Nested `TelegramVoiceConfig` — flat config works fine, no migration needed

* Advanced voice routing (intent keywords, session affinity) — basic `VoiceAgentID` routing handles current needs

* Rate limit model fallback — providers already have `RetryDo()`, this should live at the provider layer
The deferred features can be revisited when there's a concrete need. Thanks again for the thorough implementation and test coverage — it made the cherry-pick straightforward! 🙏

Thx for the feedback and for merging the core parts! I agree with the YAGNI approach for now—keeping it simple is better. Will revisit the deferred features when we have a clear use case.

Extract wake_heartbeat and stateless from JSON payload into first-class columns on cron_jobs. Adds migration 000033 with backfill from existing payload data. Updates PG + SQLite stores, RPC handlers, and UI i18n.

* upstream/main: (23 commits) fix(ui): KG graph review fixes — edge colors, fetch limit, double fitView fix(ui): KG graph perf + theme support + entity limit perf(ui): optimize KG graph view for large graphs feat(ui): enhance knowledge graph with depth visualization and performance fixes (nextlevelbuilder#572) feat(cron): add stateless mode + promote payload fields to columns fix(agent): group session unresponsive during team task execution (nextlevelbuilder#266) refactor(cron): normalize payload columns into dedicated DB fields (nextlevelbuilder#33) fix(agent): add panic recovery to prevent zombie state after agent loop crash (nextlevelbuilder#39) fix(store): provider CreateProvider uses UPSERT to handle orphaned duplicates (nextlevelbuilder#295) fix(config): MCP env: resolution, channel field filter, orphan provider event, workspace fallback (nextlevelbuilder#348, nextlevelbuilder#297, nextlevelbuilder#295, nextlevelbuilder#431) fix(store): session Save() UPSERT fallback, memory index-all user_id header (nextlevelbuilder#379, nextlevelbuilder#517) fix(agent): unblock stuck agent on /stop, auto-complete nil-result tasks, graceful shutdown (nextlevelbuilder#527, nextlevelbuilder#504, nextlevelbuilder#39) fix(providers): prevent crash on cancel, capture thinking signature, nil guard (nextlevelbuilder#287, nextlevelbuilder#188, nextlevelbuilder#566, nextlevelbuilder#335) fix(security): harden env file permissions and block NUL byte injection (nextlevelbuilder#306, nextlevelbuilder#44) fix(config): add GOCLAW_ALLOWED_ORIGINS env var for CORS config (nextlevelbuilder#543) fix: add missing MigrateUserDataOnMerge to test stubs fix(cron): resolve default agent from DB and fix update payload (nextlevelbuilder#549) (nextlevelbuilder#562) fix: Telegram @mention linking + conditional media read prompts fix: dep_scanner false positive — JS import inside Python f-string (nextlevelbuilder#564) fix: propagate TenantID in team task dispatch InboundMessage ...

Resolves vellus-ai/vellus-ai-agents-platform#33 — rebuild da imagem Docker combinando security patches (v0.1.1-sec) + React SPA embutido (v1.79.0-webui). Análise: - main HEAD (402d322) já contém TUDO: appsec patches (#14, #21, #22) + embed-web-ui - v0.1.1-sec foi buildada antes do merge do embed-web-ui (falta o SPA) - v1.79.0-webui foi buildada do branch (pré-squash), sem as diferenças do commit final - A imagem correta requer build da main HEAD com ENABLE_WEB_UI=true Mudanças: - .github/workflows/docker-publish.yaml: adiciona variante "webui" (-webui suffix) com ENABLE_WEB_UI=true; adiciona campo enable_web_ui a todas as variantes existentes - .github/workflows/rebuild-webui-hardened.yml: workflow dedicado para rebuild imediato (trigger: push neste branch ou workflow_dispatch); produz tag v1.79.1-webui no GAR; documenta os patches de segurança incluídos no job summary Próximo passo: após merge, executar o workflow e atualizar o deployment K8s para v1.79.1-webui. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Resolves vellus-ai/vellus-ai-agents-platform#33 — rebuild da imagem Docker combinando security patches (v0.1.1-sec) + React SPA embutido (v1.79.0-webui). Análise: - main HEAD (402d322) já contém TUDO: appsec patches (#14, #21, #22) + embed-web-ui - v0.1.1-sec foi buildada antes do merge do embed-web-ui (falta o SPA) - v1.79.0-webui foi buildada do branch (pré-squash), sem as diferenças do commit final - A imagem correta requer build da main HEAD com ENABLE_WEB_UI=true Mudanças: - .github/workflows/docker-publish.yaml: adiciona variante "webui" (-webui suffix) com ENABLE_WEB_UI=true; adiciona campo enable_web_ui a todas as variantes existentes - .github/workflows/rebuild-webui-hardened.yml: workflow dedicado para rebuild imediato (trigger: push neste branch ou workflow_dispatch); produz tag v1.79.1-webui no GAR; documenta os patches de segurança incluídos no job summary Próximo passo: após merge, executar o workflow e atualizar o deployment K8s para v1.79.1-webui. Co-authored-by: Milton Silva <milton@vellus.tech> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

release: promote v3.12.0 official

maemreyo added 9 commits March 1, 2026 20:04

chore: remove patch files after apply

6d08429

chore: remove patch files after apply

6996c42

viettranx closed this Mar 20, 2026

MiltonSilvaJr mentioned this pull request Mar 30, 2026

ci: add webui variant + rebuild-webui-hardened workflow (fix issue #33) vellus-ai/argoclaw#27

Merged

5 tasks

mrgoonie mentioned this pull request May 20, 2026

release: promote v3.12.0 #1165

Merged

mrgoonie added a commit that referenced this pull request May 20, 2026

Merge pull request #33 from digitopvn/dev

0c7add6

release: promote v3.12.0 official

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(telegram): Voice pipeline refactor with STT integration and configurable routing#33

feat(telegram): Voice pipeline refactor with STT integration and configurable routing#33
maemreyo wants to merge 9 commits into
nextlevelbuilder:mainfrom
maemreyo:maemreyo/telegram-voice-stt

maemreyo commented Mar 1, 2026

Uh oh!

Erudition commented Mar 11, 2026

Uh oh!

viettranx commented Mar 20, 2026

Uh oh!

maemreyo commented Mar 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

maemreyo commented Mar 1, 2026

🎯 Overview

✨ Key Features

1. Nested Voice Configuration Structure

2. Voice Agent Routing System

3. STT (Speech-to-Text) Integration

4. Audio Guard System

5. Enhanced Testability

6. Agent Loop Improvements

📊 Changes Summary

New Files

Modified Files

🔄 Migration Path

For Existing Deployments

For New Deployments

🧪 Testing

🔧 Environment Variables

📝 Documentation

Voice Routing Priority Chain

Audio Guard Behavior

🐛 Bug Fixes

🔍 Code Quality

📚 Related Issues

🙏 Acknowledgments

Uh oh!

Erudition commented Mar 11, 2026

Uh oh!

viettranx commented Mar 20, 2026

Uh oh!

maemreyo commented Mar 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants