OpenVIP is an open standard for transmitting voice interaction messages between applications, devices, and services.
Voice interaction = voice input + voice output. OpenVIP handles the full loop.
Voice assistants are everywhere, but there's no open standard for voice interaction. OpenVIP fills this gap:
- One format, many consumers — Send voice messages to any compatible system
- Transport agnostic — HTTP, WebSockets, MQTT (v1.0 uses HTTP/SSE)
- Simple core — Minimal required fields, maximum extensibility
- Best-effort — Unidirectional, fire-and-forget, simple enough for constrained devices
- Extensible — Structured metadata via
x_extension fields - Observable — Built-in tracing via
trace_id/parent_id(OpenTelemetry-style)
{
"openvip": "1.0",
"type": "transcription",
"id": "550e8400-e29b-41d4-a716-446655440000",
"timestamp": "2026-02-06T10:30:00Z",
"text": "Turn on the kitchen light",
"language": "en",
"confidence": 0.95
}| Document | Description |
|---|---|
| Protocol v1.0 | Core message format |
| HTTP Binding | REST + SSE transport |
| Type | Direction | Description |
|---|---|---|
transcription |
Engine → Agent | Transcribed text from speech-to-text |
speech |
Client → Engine | Text-to-speech request |
Messages can be enriched with x_ extension fields — structured JSON objects
that carry metadata for domain-specific use cases.
| Extension | Description |
|---|---|
x_input |
Text input behavior (submit, newline, trigger) |
x_agent_switch |
Agent routing (switch active agent) |
{
"openvip": "1.0",
"type": "transcription",
"id": "...",
"timestamp": "...",
"text": "fix the login bug",
"x_input": {
"submit": true,
"trigger": "ok send",
"confidence": 0.95
}
}Custom extensions use x_<vendor_name> prefix (e.g., x_bticino, x_telegram).
See the protocol spec for details.
Messages support OpenTelemetry-style tracing via optional fields:
| Field | Description |
|---|---|
trace_id |
ID of the original message that started the chain |
parent_id |
ID of the message this one was derived from |
This enables full observability without external tooling.
See the examples/ directory for complete message samples.
JSON Schema for validation: schema/v1.0.json
| Project | Language | Type | Description |
|---|---|---|---|
| Dictare | Python | Engine + Client | Reference implementation — voice layer for AI coding agents |
| openvip SDK | Python | Client SDK | Auto-generated from OpenAPI spec |
- Core message format (
transcription,speech) - HTTP binding with SSE
- Extension fields (
x_) with standard extensions - Built-in tracing (
trace_id,parent_id)
- Bidirectional communication
- WebSocket binding
- Authentication and authorization
See CONTRIBUTING.md for guidelines.
This specification is licensed under the MIT License.
OpenVIP™ is a trademark of Sitlab Limited. The OpenVIP name and logo are © 2026 Sitlab Limited. The protocol specification is MIT licensed — anyone can implement it freely. The trademark protects the name and logo, not the technology.
OpenVIP is an open project. Contributions welcome!