Skip to content

Transcription API response is missing diarized data for response_format=diarized_json #652

@awa-xima

Description

@awa-xima

Issue

When you try to use openAiClient.audio().transcriptions().create(createParams) with response format AudioResponseFormat.DIARIZED_JSON, the returned instance of TranscriptionCreateResponse does not have a value for the diarized field; and instead contains the entire raw JSON response in the text field of the transcription.

Expected behavior

TranscriptionCreateResponse#diarized() returns a non-empty Optional with the contents of the diarized response.

Workaround

We can read the raw JSON string and manually parse it.

new ObjectMapper().readValue(response.transcription().get().text(), TranscriptionDiarized.class)

Possible cause

From what I can tell from a little bit of debugging, the issue might be here in the AudioResponseFormat#isJson function, where a case for DIARIZED_JSON is missing. As a result, the parser considers the response to be plain text.

when (this) {
JSON -> true
TEXT -> false
SRT -> false
VERBOSE_JSON -> true
VTT -> false
else -> false
}

Image

Example

An example input/output where I observed the issue:

TranscriptionCreateParams{body=Body{file=MultipartField{value=sun.nio.ch.ChannelInputStream@967d60f, contentType=audio/mpeg, filename=sousei_no_onmyouji_short.mp3}, model=MultipartField{value=gpt-4o-transcribe-diarize, contentType=text/plain; charset=utf-8, filename=null}, chunkingStrategy=MultipartField{value=ChunkingStrategy{auto=auto}, contentType=text/plain; charset=utf-8, filename=null}, include=MultipartField{value=null, contentType=text/plain; charset=utf-8, filename=null}, knownSpeakerNames=MultipartField{value=null, contentType=text/plain; charset=utf-8, filename=null}, knownSpeakerReferences=MultipartField{value=null, contentType=text/plain; charset=utf-8, filename=null}, language=MultipartField{value=ja, contentType=text/plain; charset=utf-8, filename=null}, prompt=MultipartField{value=null, contentType=text/plain; charset=utf-8, filename=null}, responseFormat=MultipartField{value=diarized_json, contentType=text/plain; charset=utf-8, filename=null}, temperature=MultipartField{value=null, contentType=text/plain; charset=utf-8, filename=null}, timestampGranularities=MultipartField{value=null, contentType=text/plain; charset=utf-8, filename=null}, additionalProperties={}}, additionalHeaders=Headers{map={}}, additionalQueryParams=QueryParams{map={}}}

This results in the following. Note that the text field of transcription contains the entire JSON string, but diarized is missing / null.

TranscriptionCreateResponse{transcription=Transcription{text={"text":"彼女の名はアダ シノベリオ 強力な怨霊を排出 してきた京都の名家ア ダシノ家の筆頭","segments":[{"type":"transcript.text.segment","text":"彼女の名はアダシノベリオ","speaker":"A","start":1.0000000000000002,"end":3.3,"id":"seg_0"},{"type":"transcript.text.segment","text":"強力な怨霊を排出してきた京都の名家アダシノ家の筆頭","speaker":"A","start":3.8,"end":9.4,"id":"seg_1"}],"usage":{"type":"tokens","total_tokens":405,"input_tokens":97,"input_token_details":{"text_tokens":0,"audio_tokens":97},"output_tokens":308}}, logprobs=, usage=, additionalProperties={}}}

Remark

The JSON data itself seems correct, when I try to parse the raw JSON manually into an instance of TranscriptionDiarized, it works:

Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingsdk

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions