feat(telegram): Voice pipeline refactor with STT integration and configurable routing#33
feat(telegram): Voice pipeline refactor with STT integration and configurable routing#33maemreyo wants to merge 9 commits into
Conversation
- Add dmAgentAffinity map for sticky DM routing to voice agent - Add STT config fields (STTProxyURL, STTAPIKey, STTTenantID, STTTimeoutSec, VoiceAgentID) - Implement looksLikeSpeakingIntent and looksLikeNonSpeakingIntent for smart routing - Add session affinity with 6h TTL for DM conversations - Improve STT URL handling with proper trimming - Add logging for transcript attachment
- Add modelFallbacks to Loop config for fallback model support - Implement callProviderWithFallback for automatic model switching on 429 errors - Add modelCandidates helper to deduplicate primary + fallback models - Add isRateLimitFailure detection for 429 status and common rate limit error messages - Update emitLLMSpan to track actual model used in span
- Change form field from 'file' to 'audio' for speaking-service contract
- Add default tenant_id fallback ('default') when not configured
- Add speaking-agent Telegram audio guard for student replies
- Add internal identity prompt for speaking-agent in DM
- Add sanitizeSpeakingAudioStudentReply to handle technical errors
- Update STT tests for new contract
Replace hardcoded speaking-agent logic with configurable Telegram channel settings:
- VoiceStartMessage, VoiceIntentKeywords, VoiceAffinityClearKeywords, VoiceAffinityTTLMinutes
- VoiceDMContextTemplate (injects context with {user_id} substitution)
- AudioGuardFallbackTranscript/NoTranscript for custom fallback messages
- GOCLAW_STT_TENANT_ID and GOCLAW_VOICE_DM_CONTEXT_TEMPLATE env var overrides
This allows deployments to customize voice routing behavior and error fallback messages without code changes. Includes new tests for voice routing logic and audio guard sanitization.
- Replace fmt.Sprintf with strings.ReplaceAll in audio fallback template handling to prevent "%!(EXTRA string=...)" garbage when custom templates lack %s placeholder - Lowercase config keywords defensively in matchesVoiceIntent and matchesAffinityClear since inbound text is normalized but DB keywords may have mixed case - Add comprehensive test coverage for custom fallback templates with and without placeholders - Add test cases for mixed-case keyword matching in voice intent and affinity-clear routing - Ensure operators can safely configure keywords with any casing without breaking voice routing logic
- Extract voice agent reply sanitization into new voiceguard package with Guard type - Add voiceguard.SanitizeReply function to handle technical error detection and fallback messaging - Refactor voice configuration from flat fields (VoiceAgentID, VoiceDMContextTemplate) to nested Voice struct - Support both nested and flat JSON layouts in telegramInstanceConfig for backward compatibility - Add sttSem field to Channel for bounding parallel STT HTTP calls - Update gateway_consumer to use voiceguard package instead of inline sanitization logic - Remove sanitizeVoiceAgentReply, containsTechnicalErrorLanguage, and extractTranscriptFromInbound functions from gateway_consumer - Clean up unused imports (html, regexp) from gateway_consumer
- Fix assignment operator from `=` to `:=` in handleMessage for proper variable declaration - Remove duplicate test code block at end of handlers_voice_routing_test.go - Clean up test file structure to eliminate redundant package declaration and imports
|
Looking forward to this! |
… control (#33) Cherry-pick two features from PR #33: - Voiceguard: intercept technical errors (rate limits, tool failures, exit codes) in voice agent replies and replace with user-friendly fallback messages. Configurable error markers and fallback templates via TelegramConfig. - STT: shared HTTP client with connection pooling (sync.Once) and concurrency semaphore (max 4 concurrent calls) to prevent STT proxy overload.
|
Thank you for the comprehensive voice pipeline work in this PR! The voiceguard error sanitization and STT concurrency improvements have been cherry-picked and landed on main in commit 9f80842. What was merged:
What was deferred (YAGNI for now):
The deferred features can be revisited when there's a concrete need. Thanks again for the thorough implementation and test coverage — it made the cherry-pick straightforward! 🙏 |
Thx for the feedback and for merging the core parts! I agree with the YAGNI approach for now—keeping it simple is better. Will revisit the deferred features when we have a clear use case. |
Extract wake_heartbeat and stateless from JSON payload into first-class columns on cron_jobs. Adds migration 000033 with backfill from existing payload data. Updates PG + SQLite stores, RPC handlers, and UI i18n.
* upstream/main: (23 commits) fix(ui): KG graph review fixes — edge colors, fetch limit, double fitView fix(ui): KG graph perf + theme support + entity limit perf(ui): optimize KG graph view for large graphs feat(ui): enhance knowledge graph with depth visualization and performance fixes (nextlevelbuilder#572) feat(cron): add stateless mode + promote payload fields to columns fix(agent): group session unresponsive during team task execution (nextlevelbuilder#266) refactor(cron): normalize payload columns into dedicated DB fields (nextlevelbuilder#33) fix(agent): add panic recovery to prevent zombie state after agent loop crash (nextlevelbuilder#39) fix(store): provider CreateProvider uses UPSERT to handle orphaned duplicates (nextlevelbuilder#295) fix(config): MCP env: resolution, channel field filter, orphan provider event, workspace fallback (nextlevelbuilder#348, nextlevelbuilder#297, nextlevelbuilder#295, nextlevelbuilder#431) fix(store): session Save() UPSERT fallback, memory index-all user_id header (nextlevelbuilder#379, nextlevelbuilder#517) fix(agent): unblock stuck agent on /stop, auto-complete nil-result tasks, graceful shutdown (nextlevelbuilder#527, nextlevelbuilder#504, nextlevelbuilder#39) fix(providers): prevent crash on cancel, capture thinking signature, nil guard (nextlevelbuilder#287, nextlevelbuilder#188, nextlevelbuilder#566, nextlevelbuilder#335) fix(security): harden env file permissions and block NUL byte injection (nextlevelbuilder#306, nextlevelbuilder#44) fix(config): add GOCLAW_ALLOWED_ORIGINS env var for CORS config (nextlevelbuilder#543) fix: add missing MigrateUserDataOnMerge to test stubs fix(cron): resolve default agent from DB and fix update payload (nextlevelbuilder#549) (nextlevelbuilder#562) fix: Telegram @mention linking + conditional media read prompts fix: dep_scanner false positive — JS import inside Python f-string (nextlevelbuilder#564) fix: propagate TenantID in team task dispatch InboundMessage ...
Resolves vellus-ai/vellus-ai-agents-platform#33 — rebuild da imagem Docker combinando security patches (v0.1.1-sec) + React SPA embutido (v1.79.0-webui). Análise: - main HEAD (402d322) já contém TUDO: appsec patches (#14, #21, #22) + embed-web-ui - v0.1.1-sec foi buildada antes do merge do embed-web-ui (falta o SPA) - v1.79.0-webui foi buildada do branch (pré-squash), sem as diferenças do commit final - A imagem correta requer build da main HEAD com ENABLE_WEB_UI=true Mudanças: - .github/workflows/docker-publish.yaml: adiciona variante "webui" (-webui suffix) com ENABLE_WEB_UI=true; adiciona campo enable_web_ui a todas as variantes existentes - .github/workflows/rebuild-webui-hardened.yml: workflow dedicado para rebuild imediato (trigger: push neste branch ou workflow_dispatch); produz tag v1.79.1-webui no GAR; documenta os patches de segurança incluídos no job summary Próximo passo: após merge, executar o workflow e atualizar o deployment K8s para v1.79.1-webui. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Resolves vellus-ai/vellus-ai-agents-platform#33 — rebuild da imagem Docker combinando security patches (v0.1.1-sec) + React SPA embutido (v1.79.0-webui). Análise: - main HEAD (402d322) já contém TUDO: appsec patches (#14, #21, #22) + embed-web-ui - v0.1.1-sec foi buildada antes do merge do embed-web-ui (falta o SPA) - v1.79.0-webui foi buildada do branch (pré-squash), sem as diferenças do commit final - A imagem correta requer build da main HEAD com ENABLE_WEB_UI=true Mudanças: - .github/workflows/docker-publish.yaml: adiciona variante "webui" (-webui suffix) com ENABLE_WEB_UI=true; adiciona campo enable_web_ui a todas as variantes existentes - .github/workflows/rebuild-webui-hardened.yml: workflow dedicado para rebuild imediato (trigger: push neste branch ou workflow_dispatch); produz tag v1.79.1-webui no GAR; documenta os patches de segurança incluídos no job summary Próximo passo: após merge, executar o workflow e atualizar o deployment K8s para v1.79.1-webui. Co-authored-by: Milton Silva <milton@vellus.tech> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
release: promote v3.12.0 official
🎯 Overview
This PR introduces a comprehensive refactor of the Telegram voice pipeline, improving code organization, testability, and adding robust voice agent routing capabilities with STT (Speech-to-Text) integration.
✨ Key Features
1. Nested Voice Configuration Structure
TelegramVoiceConfigto group all voice-related settings under a singlevoiceJSON key2. Voice Agent Routing System
3. STT (Speech-to-Text) Integration
/transcribe_audioendpoint4. Audio Guard System
voiceguardpackage for better separation of concerns5. Enhanced Testability
resolveTargetAgent()extracted as pure function (no I/O side effects)-raceflag support6. Agent Loop Improvements
ForwardMediafield for delegation artifact forwarding📊 Changes Summary
New Files
internal/channels/telegram/voiceguard/guard.go- Audio guard logicinternal/channels/telegram/voiceguard/guard_test.go- Audio guard tests (13 tests)internal/channels/telegram/handlers_voice_routing_test.go- Voice routing tests (14 tests)internal/config/config_load_voice_test.go- Voice config testsinternal/agent/loop_fallback_test.go- Model fallback testscmd/gateway_consumer_audio_sanitize_test.go- Audio sanitization testsModified Files
internal/config/config_channels.go- NewTelegramVoiceConfigstructinternal/channels/telegram/factory.go- Legacy config promotion logicinternal/channels/telegram/handlers.go- Voice routing implementationinternal/channels/telegram/stt.go- STT concurrency control & HTTP client poolinginternal/agent/loop.go- ForwardMedia support & improved structurecmd/gateway_consumer.go- Integration with voiceguard package🔄 Migration Path
For Existing Deployments
No immediate action required! The refactor is fully backward compatible:
For New Deployments
Use the nested structure for cleaner configuration:
{ "voice": { "agent_id": "speaking-agent", "stt_proxy_url": "https://stt.example.com", "stt_api_key": "secret-key", "intent_keywords": ["speaking", "pronunciation"], "affinity_clear_keywords": ["homework", "payment"], "affinity_ttl_minutes": 360, "dm_context_template": "Context:\n- tenant: {tenant_id}\n- user_id: {user_id}", "audio_guard_fallback_transcript": "🎙️ Got your voice: \"%s\". Please try again!", "audio_guard_error_markers": ["system error", "rate limit"] } }🧪 Testing
All tests pass:
Run with race detector:
go test ./internal/channels/telegram/... -race -v🔧 Environment Variables
New environment variable support:
GOCLAW_VOICE_AGENT_ID- Override voice agent IDGOCLAW_STT_TENANT_ID- Override STT tenant IDGOCLAW_VOICE_DM_CONTEXT_TEMPLATE- Override DM context templateGOCLAW_AUDIO_GUARD_FALLBACK_TRANSCRIPT- Override transcript fallbackGOCLAW_AUDIO_GUARD_FALLBACK_NO_TRANSCRIPT- Override no-transcript fallback📝 Documentation
Voice Routing Priority Chain
/startorstarttext (DM only) → voice agent + rewrite contentAudio Guard Behavior
🐛 Bug Fixes
resolveTargetAgentcallaudiofield (not legacyfilefield)🔍 Code Quality
📚 Related Issues
Closes: (if any issue numbers)
🙏 Acknowledgments
This refactor builds upon the existing voice pipeline foundation and improves it with better structure, testability, and configurability for production deployments.