Documentation | Getting Started | MCP Tools
An MCP plugin that enables voice calls and chat messaging for AI coding assistants. Start a task, walk away. Your phone rings when the AI is done, stuck, or needs a decision. Or get notified via Discord, Telegram, or WhatsApp.
Supports: Claude Code, AWS Kiro CLI, Gemini CLI
Built with the plexusone stack - showcasing a complete voice and chat AI architecture in Go.
- 📞 Phone Calls: Real voice calls to your phone via Twilio—works with smartphones, smartwatches, landlines, or VoIP
- 💬 Chat Messaging: Send messages via Discord, Telegram, or WhatsApp
- 🔄 Multi-turn Conversations: Back-and-forth discussions, not just one-way notifications
- ⚡ Smart Triggers: Hooks that suggest calling/messaging when you're stuck or done with work
- 🔀 Mix and Match: Use voice, chat, or both based on your needs
- 🧠 Parallel Execution: AI continues working while waiting for your response—searching code, running tests, preparing next steps
AgentComms provides bidirectional communication between humans and AI agents:
AgentComms
┌──────────────────────┐
│ │
┌──────────┐ │ ┌────────────┐ │ ┌──────────┐
│ AI Agent │ ────▶│ │ MCP Server │ │◀──── │ Human │
│ Claude / │ │ │ (OUTBOUND) │ │ │ (Discord │
│ Codex │ ◀────│ └────────────┘ │────▶ │ Phone) │
└──────────┘ │ │ └──────────┘
│ ┌────────────┐ │
│ │ Daemon │ │
│ │ (INBOUND) │ │
│ └────────────┘ │
│ │ │
│ ┌────┴────┐ │
│ │ tmux │ │
│ │ pane │ │
│ └─────────┘ │
└──────────────────────┘
Two communication modes:
| Mode | Direction | Use Case |
|---|---|---|
| OUTBOUND | Agent → Human | AI needs input, reports completion, escalates blockers |
| INBOUND | Human → Agent | Interrupt agent, send instructions, coordinate multiple agents |
- AI needs input → Calls your phone or sends a chat message
- You respond → Voice is transcribed, chat is read directly
- AI continues → Uses your input to complete the task
- You send a message → Type in Discord channel or send SMS
- Daemon receives → Routes to the correct agent via tmux
- Agent sees it → Message appears in agent's terminal
┌───────────────────────────────────────────────────────────────────────────┐
│ agentcomms │
├───────────────────────────────────────────────────────────────────────────┤
│ OUTBOUND (MCP Server) - Agent → Human │
│ ├── Voice Tools: initiate_call, continue_call, speak_to_user, end_call │
│ ├── Chat Tools: send_message, list_channels, get_messages │
│ ├── Voice Manager - Orchestrates calls via omnivoice │
│ └── Chat Manager - Routes messages via omnichat │
├───────────────────────────────────────────────────────────────────────────┤
│ INBOUND (Daemon) - Human → Agent │
│ ├── Router - Actor-style event dispatcher (goroutine per agent) │
│ ├── AgentBridge - Adapters for tmux, process, etc. │
│ ├── Event Store - SQLite database via Ent ORM │
│ └── Transports - Discord, Twilio (receives human messages) │
├───────────────────────────────────────────────────────────────────────────┤
│ Shared Infrastructure │
│ ├── omnivoice - Voice abstraction (TTS, STT, Transport, CallSystem) │
│ ├── omnichat - Chat abstraction (Discord, Telegram, WhatsApp) │
│ ├── mcpkit - MCP server with ngrok integration │
│ └── Ent - Database ORM with SQLite/PostgreSQL support │
├───────────────────────────────────────────────────────────────────────────┤
│ Provider Implementations │
│ ├── Voice: ElevenLabs, Deepgram, OpenAI, Twilio │
│ └── Chat: Discord, Telegram, WhatsApp │
└───────────────────────────────────────────────────────────────────────────┘
This project demonstrates the plexusone voice and chat AI stack:
| Package | Role | Description |
|---|---|---|
| omnivoice | Voice Abstraction | Batteries-included TTS/STT with registry-based provider lookup |
| omnichat | Chat Abstraction | Provider-agnostic chat messaging interface |
| elevenlabs-go | Voice Provider | ElevenLabs streaming TTS and STT |
| omnivoice-deepgram | Voice Provider | Deepgram streaming TTS and STT |
| omnivoice-openai | Voice Provider | OpenAI TTS and STT |
| omnivoice-twilio | Phone Provider | Twilio transport and call system |
| mcpkit | Server | MCP server runtime with ngrok and multiple transport modes |
- Go 1.25+
- For voice: Twilio account + ngrok account
- For chat: Discord/Telegram bot token (optional)
cd /path/to/agentcomms
go mod tidy
go build -o agentcomms ./cmd/agentcommsAgentComms uses a unified JSON configuration file that combines all settings.
# Generate configuration file
./agentcomms config init
# Or generate minimal config (chat only, no voice)
./agentcomms config init --minimal
# Set environment variables for secrets
export DISCORD_TOKEN=your_discord_bot_token
export TWILIO_ACCOUNT_SID=ACxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
export TWILIO_AUTH_TOKEN=your_auth_token
export ELEVENLABS_API_KEY=your_elevenlabs_key
export DEEPGRAM_API_KEY=your_deepgram_key
export NGROK_AUTHTOKEN=your_ngrok_authtoken
# Validate configuration
./agentcomms config validateThe config file at ~/.agentcomms/config.json supports environment variable substitution:
{
"version": "1",
"server": { "port": 3333 },
"agents": [
{ "id": "claude", "type": "tmux", "tmux_session": "claude-code" }
],
"voice": {
"phone": {
"account_sid": "${TWILIO_ACCOUNT_SID}",
"auth_token": "${TWILIO_AUTH_TOKEN}",
"number": "+15551234567",
"user_number": "+15559876543"
},
"tts": { "provider": "elevenlabs", "api_key": "${ELEVENLABS_API_KEY}" },
"stt": { "provider": "deepgram", "api_key": "${DEEPGRAM_API_KEY}" },
"ngrok": { "auth_token": "${NGROK_AUTHTOKEN}" }
},
"chat": {
"discord": { "enabled": true, "token": "${DISCORD_TOKEN}" },
"channels": [
{ "channel_id": "discord:YOUR_CHANNEL_ID", "agent_id": "claude" }
]
}
}See Configuration Guide for full documentation.
AgentComms provides two main commands:
# Run MCP server (OUTBOUND - spawned by AI assistant)
./agentcomms serve
# Run daemon (INBOUND - background service for human messages)
./agentcomms daemonRunning ./agentcomms without a subcommand defaults to serve for backwards compatibility.
./agentcomms serveOutput:
Starting agentcomms MCP server...
Using plexusone stack:
- omnivoice (voice abstraction)
- omnichat (chat abstraction)
- mcpkit (MCP server)
Voice providers: tts=elevenlabs stt=deepgram
Chat providers: [discord telegram]
MCP server ready
Local: http://localhost:3333/mcp
Public: https://abc123.ngrok.io/mcp
The daemon enables human-to-agent communication. It runs as a background service and routes messages from Discord/Twilio to agents running in tmux.
./agentcomms daemonOutput:
INFO starting daemon data_dir=/Users/you/.agentcomms socket=/Users/you/.agentcomms/daemon.sock
INFO database initialized path=/Users/you/.agentcomms/data.db
INFO router initialized
INFO daemon started
Data storage: ~/.agentcomms/
config.json- Unified configuration filedata.db- SQLite database (events, agents)daemon.sock- Unix socket for CLI/API
Once the daemon is running, use these CLI commands to interact with it:
# Check daemon status
./agentcomms status
# List configured agents
./agentcomms agents
# Send a message to an agent (appears in tmux pane)
./agentcomms send <agent-id> "Your message here"
# Send an interrupt (Ctrl-C) to an agent
./agentcomms interrupt <agent-id>
# View recent events for an agent
./agentcomms events <agent-id> --limit 20
# Send a reply to a chat channel (outbound from agent)
./agentcomms reply discord:123456789 "Task completed!"
# List configured chat channels
./agentcomms channels
# Validate configuration
./agentcomms config validate
# Show current configuration
./agentcomms config showGenerate and edit the configuration:
# Generate config file
./agentcomms config init
# Edit ~/.agentcomms/config.json with your settings
# Validate configuration
./agentcomms config validateSee the Configuration Guide for full details.
agentcomms supports multiple AI coding assistants. Generate configuration files for your preferred tool:
# Generate for a specific tool
go run ./cmd/generate-plugin claude . # Claude Code
go run ./cmd/generate-plugin kiro . # AWS Kiro CLI
go run ./cmd/generate-plugin gemini . # Gemini CLI
# Generate for all tools
go run ./cmd/generate-plugin all ./pluginsOption 1: Use generated plugin files
go run ./cmd/generate-plugin claude .This creates:
.claude-plugin/plugin.json- Plugin manifestskills/phone-input/SKILL.md- Voice calling skillskills/chat-messaging/SKILL.md- Chat messaging skillcommands/call.md-/callslash commandcommands/message.md-/messageslash command.claude/settings.json- Lifecycle hooks
Option 2: Manual MCP configuration
Add to ~/.claude/settings.json or .claude/settings.json:
{
"mcpServers": {
"agentcomms": {
"command": "/path/to/agentcomms",
"env": {
"TWILIO_ACCOUNT_SID": "ACxxx",
"TWILIO_AUTH_TOKEN": "xxx",
"NGROK_AUTHTOKEN": "xxx",
"DISCORD_TOKEN": "xxx",
"ELEVENLABS_API_KEY": "xxx",
"DEEPGRAM_API_KEY": "xxx",
"AGENTCOMMS_AGENT_ID": "claude"
}
}
}
}Start a new call to the user.
{
"message": "Hey! I finished implementing the feature. Want me to walk you through it?"
}Returns:
{
"call_id": "call-1-1234567890",
"response": "Sure, go ahead and explain what you built."
}Continue an active call with another message.
{
"call_id": "call-1-1234567890",
"message": "I added authentication using JWT. Should I also add refresh tokens?"
}Speak without waiting for a response (useful for status updates).
{
"call_id": "call-1-1234567890",
"message": "Let me search for that in the codebase. Give me a moment..."
}End the call with an optional goodbye message.
{
"call_id": "call-1-1234567890",
"message": "Perfect! I'll get started on that. Talk soon!"
}Send a message to a chat channel.
{
"provider": "discord",
"chat_id": "123456789",
"message": "I've finished the PR! Here's the link: https://github.com/..."
}List available chat channels and their status.
{}Returns:
{
"channels": [
{"provider_name": "discord", "status": "connected"},
{"provider_name": "telegram", "status": "connected"}
]
}Get recent messages from a chat conversation.
{
"provider": "telegram",
"chat_id": "987654321",
"limit": 5
}These tools allow Claude Code to poll for messages sent by humans via the daemon.
Check for new messages sent to this agent from humans via chat.
{
"agent_id": "claude",
"limit": 10
}Returns:
{
"messages": [
{
"id": "evt_01ABC123",
"channel_id": "discord:123456789",
"provider": "discord",
"text": "Hey, can you also add unit tests?",
"timestamp": "2024-01-15T10:30:00Z",
"type": "human_message"
}
],
"agent_id": "claude",
"has_more": false
}Get all recent events for an agent (messages, interrupts, status changes).
{
"agent_id": "claude",
"since_id": "evt_01ABC123",
"limit": 20
}Check if the agentcomms daemon is running.
{}Returns:
{
"running": true,
"started_at": "2024-01-15T09:00:00Z",
"agents": 1,
"providers": ["discord", "telegram"]
}Phone calls are ideal for:
- Reporting significant task completion
- Requesting urgent clarification when blocked
- Discussing complex decisions
- Walking through code changes
- Multi-step processes needing back-and-forth
Chat messaging is ideal for:
- Asynchronous status updates
- Sharing links, code, or formatted content
- Non-urgent notifications
- Follow-up summaries
agentcomms/
├── cmd/
│ └── agentcomms/
│ ├── main.go # CLI entry point (serve, daemon)
│ └── commands.go # CLI commands (send, interrupt, reply, etc.)
├── internal/ # INBOUND infrastructure
│ ├── daemon/
│ │ ├── daemon.go # Background daemon service
│ │ ├── server.go # Unix socket server
│ │ ├── client.go # Client library for IPC
│ │ ├── protocol.go # JSON-RPC style protocol
│ │ └── config.go # Daemon configuration (YAML)
│ ├── router/
│ │ ├── router.go # Event dispatcher
│ │ └── actor.go # Per-agent actor (goroutine)
│ ├── bridge/
│ │ ├── adapter.go # Agent adapter interface
│ │ └── tmux.go # tmux adapter
│ ├── transport/
│ │ └── chat.go # Chat transport (omnichat)
│ └── events/
│ └── id.go # Event ID generation
├── ent/ # Database schema (Ent ORM)
│ └── schema/
│ ├── event.go # Event entity
│ └── agent.go # Agent entity
├── pkg/ # OUTBOUND infrastructure
│ ├── voice/
│ │ └── manager.go # Voice call orchestration
│ ├── chat/
│ │ └── manager.go # Chat message routing
│ ├── config/
│ │ ├── config.go # Legacy configuration
│ │ └── unified.go # Unified JSON configuration
│ └── tools/
│ └── tools.go # MCP tool definitions
├── examples/
│ └── config.json # Example JSON configuration
├── docs/
│ └── design/ # Architecture documentation
│ ├── FEAT_INBOUND_PRD.md
│ ├── FEAT_INBOUND_TRD.md
│ └── FEAT_INBOUND_PLAN.md
├── go.mod
└── README.md
github.com/plexusone/omnivoice- Batteries-included voice abstractiongithub.com/plexusone/omnichat- Chat messaging abstractiongithub.com/plexusone/omnivoice-twilio- Twilio transport and call systemgithub.com/plexusone/mcpkit- MCP server runtimegithub.com/modelcontextprotocol/go-sdk- MCP protocol SDKentgo.io/ent- Entity framework for Go (database ORM)modernc.org/sqlite- Pure Go SQLite driver
| Service | Cost |
|---|---|
| Twilio outbound calls | ~$0.014/min |
| Twilio phone number | ~$1.15/month |
| ElevenLabs TTS | |
| ElevenLabs STT | ~$0.10/min (Scribe) |
| Deepgram TTS | ~$0.015/1K chars |
| Deepgram STT | ~$0.0043/min (Nova-2) |
| OpenAI TTS | ~$0.015/1K chars |
| OpenAI STT | ~$0.006/min (Whisper) |
| Discord/Telegram | Free |
| ngrok (free tier) | $0 |
Provider Recommendations:
| Priority | TTS Provider | STT Provider | Total Cost/min | Notes |
|---|---|---|---|---|
| Lowest Cost | Deepgram | Deepgram | ~$0.03 | Best value, good quality |
| Best Quality | ElevenLabs | Deepgram | ~$0.05 | Premium voices, fast transcription |
| Balanced | OpenAI | OpenAI | ~$0.04 | Single API key, consistent quality |
Costs are approximate and exclude Twilio phone charges (~$0.014/min).
MIT
Inspired by ZeframLou/call-me (TypeScript).
Built with the plexusone stack:
- omnivoice - Voice abstraction layer
- omnichat - Chat messaging abstraction
- elevenlabs-go - ElevenLabs provider
- omnivoice-deepgram - Deepgram provider
- omnivoice-twilio - Twilio provider
- mcpkit - MCP server runtime
- assistantkit - Multi-tool plugin configuration