From b9831a890fd4a275d1596505b721613a27a06bec Mon Sep 17 00:00:00 2001 From: Yahia <42359972+yaiir-a@users.noreply.github.com> Date: Wed, 6 May 2026 14:50:12 +0100 Subject: [PATCH 1/2] lang hints --- docs/private/next-gen-model.mdx | 61 +++++++++++++++++++++------------ 1 file changed, 39 insertions(+), 22 deletions(-) diff --git a/docs/private/next-gen-model.mdx b/docs/private/next-gen-model.mdx index a4cd033..1198444 100644 --- a/docs/private/next-gen-model.mdx +++ b/docs/private/next-gen-model.mdx @@ -18,7 +18,17 @@ Once you have your API key, there are a few ways to try out the new model: 2. Try the API using the reference documentation below ## API reference -The API schema is consistent with the [V2 Batch REST API](https://docs.speechmatics.com/api-ref/batch/create-a-new-job). Example minimal transcription config: +The API schema is consistent with the [V2 Batch REST API](https://docs.speechmatics.com/api-ref/batch/create-a-new-job) with some minor additions. + +### Supported Endpoints + +| Region | Endpoint | +| ------------- | --------------- | +| EU1 (Europe) | **eu1.asr.api.speechmatics.com** | +| US1 (USA) | **us1.asr.api.speechmatics.com** | + +### Minimal example +Example minimal transcription config: ```json { "type": "transcription", @@ -30,7 +40,33 @@ The API schema is consistent with the [V2 Batch REST API](https://docs.speechmat } } ``` +### Language hints +If you know which languages are spoken in a given audio file, you can optionally provide a list of language hints to reduce the likelihood of unexpected languages and scripts being output. +Example config with language hints: + +```json +{ + "type": "transcription", + "transcription_config": { + "language": "multi", + "operating_point": "omni-v1", + // highlight-start + "language_hints": ["en", "ar"] + // highlight-end + } +} +``` + +- You can specify any number of Speechmatics [supported language ISO codes](../speech-to-text/languages#transcription-languages) +- For monolingual audio files, specifying the single language present can also improve WER by around 2% relative + +### Language labelling +The JSON output already includes a `language` property for each predicted word. +The `standard` and `enhanced` models will predict the same language for every word in the file. +The `omni-v1` model will more accurately predict the language being spoken at a granularity of around 14 seconds. + +#### Language pack information Two new properties have been introduced into the returned job metadata based on languages appearing in the transcript: @@ -50,13 +86,6 @@ Two new properties have been introduced into the returned job metadata based on } ``` -### Supported Endpoints - -| Region | Endpoint | -| ------------- | --------------- | -| EU1 (Europe) | **eu1.asr.api.speechmatics.com** | -| US1 (USA) | **us1.asr.api.speechmatics.com** | - ## Capabilities | Feature | Current capability | Future plans | @@ -73,7 +102,7 @@ Two new properties have been introduced into the returned job metadata based on | Word timings | ✅ Match current behaviour| | | Punctuation | ✅ Match current behaviour| | | Notifications | ✅ Match current behaviour| | -| Output locale | ⚠️ Not available| Q2: Matching current behaviour | +| Output locale | ✅ Match current behaviour| | | Profanity tagging | ⚠️ Not available | Q3: Match current behaviour | | Entity detection | ⚠️ Not available| Prioritised according to customer need | | Audio volume filtering | ⚠️ Not available| Prioritised according to customer need | @@ -111,19 +140,7 @@ docker run --rm -it \ #### Transcription config -It is advised to create a config file for the transcriptions jobs. An example of the minimum required is below: - -Create the config.json file with the following contents - -``` -{ - "type": "transcription", - "transcription_config": { - "language":"multi", - "model": "omni-v1" - } -} -``` +It is advised to create a config.json file for the transcriptions jobs. A [minimal example](#minimal-example) is shown above. #### Running the transcription From 53223fd182137a1ffc3c0e842e88102fcee4e910 Mon Sep 17 00:00:00 2001 From: Yahia <42359972+yaiir-a@users.noreply.github.com> Date: Wed, 6 May 2026 16:25:27 +0100 Subject: [PATCH 2/2] fix heading --- docs/private/next-gen-model.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/private/next-gen-model.mdx b/docs/private/next-gen-model.mdx index 1198444..010cb07 100644 --- a/docs/private/next-gen-model.mdx +++ b/docs/private/next-gen-model.mdx @@ -66,7 +66,7 @@ The JSON output already includes a `language` property for each predicted word. The `standard` and `enhanced` models will predict the same language for every word in the file. The `omni-v1` model will more accurately predict the language being spoken at a granularity of around 14 seconds. -#### Language pack information +### Language pack information Two new properties have been introduced into the returned job metadata based on languages appearing in the transcript: