Add WebSocket protocol guide for Realtime API by jaderf-sm · Pull Request #203 · speechmatics/docs

jaderf-sm · 2026-02-18T08:37:38Z

Add WebSocket protocol guide for Realtime API

Why

TLDR see #202

The Realtime quickstart only covers Python and JavaScript SDKs. Developers working in Go, Rust, Java, or any other language have no tutorial-style guide to follow — only the API reference spec, which documents schemas but doesn't walk through the actual message flow.

What this PR adds

A new guide at Speech to text > Realtime > Guides > WebSocket protocol that walks a developer through a complete Realtime transcription session using raw WebSocket messages:

How to connect and authenticate (both server-side and browser-side)
How to start a session with StartRecognition (with minimal and full examples)
How to stream audio as binary frames and handle server acknowledgements
How to receive final and partial transcripts
How to cleanly end the session

The guide also covers practical concerns that developers building their own clients need to know about:

Backpressure — what happens if you send audio faster than the server can process, and how to avoid connection drops (based on patterns from the Python SDK)
Session limits — the 48-hour, 1-hour idle, and 3-minute ping timeouts, including the warning messages the server sends before auto-terminating
Error handling — common WebSocket close codes and when to retry

Small cross-link additions were made to three existing pages so developers can discover the new guide:

Quickstart — info box at the top for developers looking for the raw protocol
Input page — new bullet in the "Next steps" section
API reference — tip after the protocol overview diagram

Open Technical Questions

Which endpoint URL should we document? The existing docs use eu.rt.speechmatics.com but both SDKs default to eu2.rt.speechmatics.com. The guide currently uses the docs version.
Is language needed in the URL path? The Python SDK connects to /v2/en but the JS SDK and API reference use just /v2/. The guide uses /v2/ and puts language in the StartRecognition body only.
What does last_seq_no do on the server side? The guide documents the mechanics (set it to the last seq_no you received from AudioAdded) but a brief explanation of why would make the guide more helpful.
Sample audio for testing — the guide links to the existing example.wav from the JS SDK repo and provides an ffmpeg command for raw PCM conversion. Would a hosted .raw file be better?

How to review

Start the dev server (volta run npm start) and navigate to Speech to text > Realtime > Guides > WebSocket protocol
Walk through the guide as if you were a developer in an unsupported language — does the message flow make sense? Are the JSON examples clear?
Are the technical details in the documentation correct? (these were inferred from available SDK source code)
Check the cross-links from the quickstart, input page, and API reference

Test plan

[volta run] npm run build passes with no broken links
Page renders correctly and sidebar shows "WebSocket protocol" under Guides
Mermaid diagram renders in both light and dark themes
Cross-links from quickstart, input, and API reference all work
SME review of technical details documented

Note: SME questions for review are marked as JSX comments ({/* SME-REVIEW */}) in the websocket-protocol.mdx source where these open questions apply. Search for SME-REVIEW to find them.

URLs to test

Page	What to check
`/speech-to-text/realtime/guides/websocket-protocol`	New page renders, Mermaid diagram displays, all links resolve
`/speech-to-text/realtime/quickstart`	New `:::info` admonition links to WebSocket protocol guide
`/speech-to-text/realtime/input`	New bullet in "Next steps" links to WebSocket protocol guide
`/api-ref/realtime-transcription-websocket`	New `:::tip` after Mermaid diagram links to WebSocket protocol guide

Future improvements

The guide currently describes the raw WebSocket message flow without language-specific code samples. A natural next step is to add short, self-contained code snippets that demonstrate connecting to the API and running a basic transcription session.

Priority languages (no official SDK available):

Go — widely used for backend services and CLI tools
Java/Kotlin — dominant in enterprise and Android development
Swift — needed for native iOS/macOS clients
Ruby — popular in web development (Rails ecosystem), but probably lower priority as is losing marketshare quickly

Each sample would cover the full lifecycle: connect, authenticate, send StartRecognition, stream audio, receive transcripts, and send EndOfStream. The goal is a copy-paste-and-run experience, similar to how the existing quickstart works for Python and JavaScript.

Closes #202

vercel · 2026-02-18T08:37:43Z

@jaderfeijo is attempting to deploy a commit to the Speechmatics Team on Vercel.

A member of the Team first needs to authorize it.

vercel · 2026-02-18T14:08:01Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
docs	Ready	Preview, Comment	Feb 18, 2026 2:09pm

mnemitz · 2026-02-18T14:09:10Z

docs/speech-to-text/realtime/guides/websocket-protocol.mdx

+
+A US endpoint is also available at `wss://us.rt.speechmatics.com/v2`. See the [full endpoint list](/get-started/authentication#supported-endpoints) for all regions.
+
+{/* <!-- SME-REVIEW: Python SDK appends language to path (/v2/en). JS SDK and API reference use /v2/ only. Confirming /v2/ is correct for raw WebSocket clients. --> */}


Language is not needed in the URL path. This is the legacy URL patterns, which may still work fine, but we definitely should encourage the new scheme.

mnemitz · 2026-02-18T14:26:07Z

docs/speech-to-text/realtime/guides/websocket-protocol.mdx

+
+### 5. End the session
+
+When you've finished sending audio, send an `EndOfStream` message. Set `last_seq_no` to the `seq_no` from the last `AudioAdded` message you received:


To your question about what the server-side purpose of last_seq_no is: I believe it's so that we only return transcription results from the audio frames before that seq_no, to ensure a clean cutoff. I need to confirm that though,

jaderfeijo added 3 commits February 18, 2026 08:03

Add raw websocket documentation

6c9aa2a

Add information on Backpressure

ac70d6e

Websocket: Add section on limits

c9e3bbb

jaderf-sm self-assigned this Feb 18, 2026

jaderf-sm added documentation Improvements or additions to documentation enhancement New feature or request labels Feb 18, 2026

vercel bot deployed to Preview February 18, 2026 14:09 View deployment

mnemitz reviewed Feb 18, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add WebSocket protocol guide for Realtime API#203

Add WebSocket protocol guide for Realtime API#203
jaderf-sm wants to merge 3 commits intospeechmatics:mainfrom
jaderf-sm:issue/202-websocket-docs

jaderf-sm commented Feb 18, 2026 •

edited

Loading

Uh oh!

vercel bot commented Feb 18, 2026

Uh oh!

vercel bot commented Feb 18, 2026 •

edited

Loading

Uh oh!

mnemitz Feb 18, 2026

Uh oh!

mnemitz Feb 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants

Comments


		A US endpoint is also available at `wss://us.rt.speechmatics.com/v2`. See the [full endpoint list](/get-started/authentication#supported-endpoints) for all regions.

		{/* <!-- SME-REVIEW: Python SDK appends language to path (/v2/en). JS SDK and API reference use /v2/ only. Confirming /v2/ is correct for raw WebSocket clients. --> */}


		### 5. End the session

		When you've finished sending audio, send an `EndOfStream` message. Set `last_seq_no` to the `seq_no` from the last `AudioAdded` message you received:

Conversation

jaderf-sm commented Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!