Skip to content

docs: Course section — Voice Input/Output with STT and TTS #16

@rdwj

Description

@rdwj

Summary

Teach users how to add voice capabilities to their agents using speech-to-text and text-to-speech sidecars. Voice support enables hands-free interaction while keeping the core agent pipeline text-based for reliability and debuggability.

Course Section Outline

  • Architecture overview — STT/TTS sidecars flanking the text-based agent pipeline
  • Deploying Granite Speech or Faster-Whisper for speech-to-text
  • Deploying Kokoro-FastAPI for text-to-speech synthesis
  • Configuring voice endpoints in agent.yaml
  • Gateway routing for audio requests — content type negotiation and streaming
  • UI microphone capture, audio playback, and push-to-talk integration
  • FIPS considerations for media transport and audio codec selection

Lab Exercise

Deploy STT and TTS sidecars alongside an existing agent. Configure the gateway to route audio. Use the UI to record a voice question, observe the transcription, read the agent's text response, and hear the TTS playback. Test the complete voice conversation flow end-to-end.

Companion Issues

Companion issues filed on fips-agents/agent-template, fips-agents/gateway-template, fips-agents/ui-template, and fips-agents/fips-agents-cli.

Size

M

Metadata

Metadata

Assignees

No one assigned

    Labels

    documentationImprovements or additions to documentation

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions