Skip to content

ekaone/rendition

Repository files navigation

@ekaone/rendition

OpenAI Realtime Translation SDK — WebSocket + WebRTC transports, React hook, zero-dependency core.

npm install @ekaone/rendition

Transports

Transport When to use
"webrtc" Browser mic capture — audio flows via RTCPeerConnection, no manual audio piping
"websocket" Server-side pipelines — Twilio, SIP, broadcast ingest, media workers

createTranslationSession() auto-selects: "webrtc" in browser, "websocket" in Node.


React hook (Vite / Next.js)

npm install react react-dom
import { useTranslationSession } from "@ekaone/rendition/react";

function TranslationWidget() {
  const {
    status,
    outputTranscript,
    inputTranscript,
    isSpeaking,
    mediaStream,
    connect,
    disconnect,
    error,
  } = useTranslationSession({
    apiKey: ephemeralClientSecret, // from your backend — never ship raw key to browser
    targetLanguage: "es",          // BCP-47 tag
    transport: "webrtc",           // mic → WebRTC → OpenAI
  });

  function toggleMic() {
    if (!mediaStream) return;
    for (const track of mediaStream.getAudioTracks()) {
      track.enabled = !track.enabled;
    }
  }

  return (
    <div>
      <p>Status: {status} {isSpeaking && "(speaking)"}</p>
      <button onClick={connect} disabled={status !== "idle" && status !== "closed"}>Start</button>
      <button onClick={disconnect} disabled={status === "idle" || status === "closed"}>Stop</button>
      <button onClick={toggleMic} disabled={!mediaStream}>Mute / Unmute</button>
      {error && <p>Error: {error}</p>}
      <p><strong>Source:</strong> {inputTranscript}</p>
      <p><strong>Translation:</strong> {outputTranscript}</p>
    </div>
  );
}

Hook options

Option Type Default Description
apiKey string required OpenAI API key or ephemeral client secret
targetLanguage string required BCP-47 output language tag, e.g. "es", "fr", "ja"
transport "webrtc" | "websocket" "webrtc" in browser Transport to use
safetyIdentifier string Hashed user ID sent as OpenAI-Safety-Identifier header
model string "gpt-realtime-translate" Model override
wsEndpoint string OpenAI default Custom WebSocket endpoint
onAudioDelta (delta: string) => void Override audio playback (WebSocket path)
autoConnect boolean false Connect automatically on mount

Hook result

Field Type Description
status TranslationSessionStatus idle | connecting | ready | translating | closing | closed | error
outputTranscript string Accumulated translated transcript
inputTranscript string Accumulated source transcript
isSpeaking boolean true while translated transcript deltas are actively arriving (~700 ms idle reset)
mediaStream MediaStream | null Local mic stream (WebRTC only) — use to mute/replace the mic track
connect () => Promise<void> Start the session
disconnect () => Promise<void> Gracefully close
appendAudio (b64: string) => void Append audio manually (WebSocket path)
error string | null Last error message

Core API (framework-agnostic)

createTranslationSession(config)

import { createTranslationSession } from "@ekaone/rendition";

const session = await createTranslationSession({
  apiKey: process.env.OPENAI_API_KEY,
  targetLanguage: "fr",
  transport: "websocket",
});

session.on("session.output_transcript.delta", (evt) => {
  process.stdout.write(evt.delta);
});

session.on("session.output_audio.delta", (evt) => {
  // write evt.delta (base64 PCM16) to your media output
});

session.appendAudio(base64Pcm16Chunk);

await session.close();

createNodeWebSocketSession(config)

Server-side variant using the ws package — supports Authorization and OpenAI-Safety-Identifier headers that browser WebSocket cannot set.

npm install ws
import { createNodeWebSocketSession } from "@ekaone/rendition";

const session = await createNodeWebSocketSession({
  apiKey: process.env.OPENAI_API_KEY,
  targetLanguage: "ja",
  safetyIdentifier: "hashed-user-id",
});

Audio utilities

Audio helpers are available via a dedicated subpath so they do not bloat the session protocol surface:

import {
  float32ToBase64Pcm16,
  base64Pcm16ToFloat32,
  playPcm16Delta,
} from "@ekaone/rendition/utils";

// Web Audio Float32 → base64 PCM16 (for appendAudio)
const b64 = float32ToBase64Pcm16(float32Array);

// base64 PCM16 → Float32 (for custom playback)
const float32 = base64Pcm16ToFloat32(b64);

// Play a delta chunk directly via Web Audio API
playPcm16Delta(audioContext, b64, 24000);

The same helpers are also re-exported from the main @ekaone/rendition entry for backwards compatibility.


Session events

Event Payload Description
session.output_audio.delta { delta: string } Base64 PCM16 translated audio chunk
session.output_transcript.delta { delta: string } Translated transcript delta
session.input_transcript.delta { delta: string } Source transcript delta
session.closed Session fully flushed and closed
session.created { session } Session created
session.updated { session } Session config updated
error { error } API error

Session lifecycle

idle → connecting → ready → translating → closing → closed
                                        ↘ error

WebSocket teardown follows the spec:

  1. Send session.close
  2. Keep reading events — do not close the socket yet
  3. Receive session.closed → close the socket

Skipping step 2 drops buffered translated audio still draining from the session.


Exports

Import path Contents
@ekaone/rendition createTranslationSession, createNodeWebSocketSession, WebSocketSession, WebRTCSession, audio utils, all types
@ekaone/rendition/react useTranslationSession, hook types
@ekaone/rendition/utils float32ToBase64Pcm16, base64Pcm16ToFloat32, playPcm16Delta

Architecture

src/
├── types/         # Shared TypeScript types
├── core/
│   ├── emitter.ts             # Typed zero-dep event emitter
│   ├── audio.ts               # PCM16 ↔ Float32, Web Audio playback
│   ├── websocket-session.ts   # Browser + Node WebSocket sessions
│   ├── webrtc-session.ts      # Browser WebRTC session
│   └── factory.ts             # createTranslationSession()
├── react/
│   ├── useTranslationSession.ts
│   └── index.ts
├── utils/
│   └── index.ts               # Audio helper re-exports
└── index.ts       # Core barrel

License

MIT © Eka Prasetia

Links

About

OpenAI Realtime Translation SDK

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors