Get started using the Soniox audio transcription loader in LangChain.
Install the package:
pip install langchain-sonioxGet your Soniox API key from the Soniox Console and set it as an environment variable:
export SONIOX_API_KEY=your_api_keyTranscribe audio files using the SonioxDocumentLoader:
from langchain_soniox import SonioxDocumentLoader
# Using a URL
loader = SonioxDocumentLoader(
file_url="https://soniox.com/media/examples/coffee_shop.mp3"
)
docs = list(loader.lazy_load())
print(docs[0].page_content) # Transcribed textYou can also load audio from a local file or from bytes:
# Using a local file path
loader = SonioxDocumentLoader(file_path="/path/to/audio.mp3")
# Using binary data
with open("/path/to/audio.mp3", "rb") as f:
audio_bytes = f.read()
loader = SonioxDocumentLoader(file_data=audio_bytes)For async operations, use alazy_load():
import asyncio
from langchain_soniox import SonioxDocumentLoader
async def transcribe_async():
loader = SonioxDocumentLoader(
file_url="https://soniox.com/media/examples/coffee_shop.mp3"
)
docs = [doc async for doc in loader.alazy_load()]
print(docs[0].page_content)
asyncio.run(transcribe_async())Soniox automatically detects and transcribes speech in 60+ languages. When you know which languages are likely to appear in your audio, provide language_hints to improve accuracy by biasing recognition toward those languages.
Language hints do not restrict recognition — they only bias the model toward the specified languages, while still allowing other languages to be detected if present.
from langchain_soniox import (
SonioxDocumentLoader,
SonioxTranscriptionOptions,
)
loader = SonioxDocumentLoader(
file_url="https://soniox.com/media/examples/coffee_shop.mp3",
options=SonioxTranscriptionOptions(
language_hints=["en", "es"],
),
)
docs = list(loader.lazy_load())For more details, see the Soniox language hints documentation.
Enable speaker identification to distinguish between different speakers:
from langchain_soniox import (
SonioxDocumentLoader,
SonioxTranscriptionOptions,
)
loader = SonioxDocumentLoader(
file_url="https://soniox.com/media/examples/coffee_shop.mp3",
options=SonioxTranscriptionOptions(
enable_speaker_diarization=True,
),
)
docs = list(loader.lazy_load())
# Access speaker information in the metadata
current_speaker = None
output = ""
for token in docs[0].metadata["tokens"]:
if current_speaker != token["speaker"]:
current_speaker = token["speaker"]
output += f"\nSpeaker {current_speaker}: {token['text'].lstrip()}"
else:
output += token["text"]
print(output)Enable automatic language detection and identification:
from langchain_soniox import (
SonioxDocumentLoader,
SonioxTranscriptionOptions,
)
loader = SonioxDocumentLoader(
file_url="https://soniox.com/media/examples/coffee_shop.mp3",
options=SonioxTranscriptionOptions(
enable_language_identification=True,
),
)
docs = list(loader.lazy_load())
# Access language information in the metadata
current_language = None
output = ""
for token in docs[0].metadata["tokens"]:
if current_language != token["language"]:
current_language = token["language"]
output += f"\n[{current_language}] {token['text'].lstrip()}"
else:
output += token["text"]
print(output)Provide domain-specific context to improve transcription accuracy. Context helps the model understand your domain, recognize important terms, and apply custom vocabulary.
The context object supports four optional sections:
from langchain_soniox import (
SonioxDocumentLoader,
SonioxTranscriptionOptions,
StructuredContext,
StructuredContextGeneralItem,
StructuredContextTranslationTerm,
)
loader = SonioxDocumentLoader(
file_url="https://soniox.com/media/examples/coffee_shop.mp3",
options=SonioxTranscriptionOptions(
context=StructuredContext(
# Structured key-value information (domain, topic, intent, etc.)
general=[
StructuredContextGeneralItem(key="domain", value="Healthcare"),
StructuredContextGeneralItem(
key="topic", value="Diabetes management consultation"
),
StructuredContextGeneralItem(key="doctor", value="Dr. Martha Smith"),
],
# Longer free-form background text or related documents
text="The patient has a history of...",
# Domain-specific or uncommon words
terms=["Celebrex", "Zyrtec", "Xanax"],
# Custom translations for ambiguous terms
translation_terms=[
StructuredContextTranslationTerm(
source="Mr. Smith", target="Sr. Smith"
),
StructuredContextTranslationTerm(source="MRI", target="RM"),
],
),
),
)
docs = list(loader.lazy_load())For more details, see the Soniox context documentation.
Translate from any detected language to a target language:
from langchain_soniox import (
SonioxDocumentLoader,
SonioxTranscriptionOptions,
TranslationConfig,
)
loader = SonioxDocumentLoader(
file_url="https://soniox.com/media/examples/coffee_shop.mp3",
options=SonioxTranscriptionOptions(
translation=TranslationConfig(
type="one_way",
target_language="fr",
),
language_hints=["en"],
),
)
docs = list(loader.lazy_load())
translated_text = ""
original_text = ""
for token in docs[0].metadata["tokens"]:
if token["translation_status"] == "translation":
translated_text += token["text"]
else:
original_text += token["text"]
print("Original text:", original_text)
print("Translated text:", translated_text)You can also transcribe and translate between two languages simultaneously using two_way translation type. Learn more about translation here.
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
file_path |
str |
No* | None |
Path to local audio file to transcribe |
file_data |
bytes |
No* | None |
Binary data of audio file to transcribe |
file_url |
str |
No* | None |
URL of audio file to transcribe |
api_key |
str |
No | SONIOX_API_KEY env var |
Soniox API key |
base_url |
str |
No | https://api.soniox.com/v1 |
API base URL (see regional endpoints) |
options |
SonioxTranscriptionOptions |
No | SonioxTranscriptionOptions() |
Transcription options |
polling_interval_seconds |
float |
No | 1.0 |
Time between status polls (seconds) |
timeout_seconds |
float |
No | 300.0 (5 minutes) |
Maximum time to wait for transcription |
http_request_timeout_seconds |
float |
No | 60.0 |
Timeout for individual HTTP requests |
* You must specify exactly one of: file_path, file_data, or file_url.
The SonioxTranscriptionOptions class supports these parameters:
| Parameter | Type | Description |
|---|---|---|
model |
str |
Async model to use (see available models) |
language_hints |
list[str] |
Language hints for transcription (ISO language codes) |
language_hints_strict |
bool |
Enforce strict language hints |
enable_speaker_diarization |
bool |
Enable speaker identification |
enable_language_identification |
bool |
Enable language detection |
translation |
TranslationConfig |
Translation configuration |
context |
StructuredContext |
Context for improved accuracy |
client_reference_id |
str |
Custom reference ID for your records |
webhook_url |
str |
Webhook URL for completion notifications |
webhook_auth_header_name |
str |
Custom auth header name for webhook |
webhook_auth_header_value |
str |
Custom auth header value for webhook |
Browse the API documentation for a full list of supported options.
The lazy_load() and alazy_load() methods yield a single Document object:
Document(
page_content=str, # The transcribed text
metadata={
"source": str, # File URL, path, or "file_upload"
"transcription_id": str, # Unique transcription ID
"audio_duration_ms": int, # Audio duration in milliseconds
"model": str, # Model used for transcription
"created_at": str, # ISO 8601 timestamp
"tokens": list[dict], # Detailed token-level information
}
)The tokens array in metadata includes detailed information for each transcribed word:
text: The transcribed textstart_ms: Start time in millisecondsend_ms: End time in millisecondsspeaker: Speaker ID (if diarization enabled), for example"1","2", etc.language: Detected language (if identification enabled), for example"en","fr", etc.translation_status: Translation status ("original","translated"or"none")
Learn more about the Soniox API reference.