-
Notifications
You must be signed in to change notification settings - Fork 179
Description
Issue
When you try to use openAiClient.audio().transcriptions().create(createParams) with response format AudioResponseFormat.DIARIZED_JSON, the returned instance of TranscriptionCreateResponse does not have a value for the diarized field; and instead contains the entire raw JSON response in the text field of the transcription.
Expected behavior
TranscriptionCreateResponse#diarized() returns a non-empty Optional with the contents of the diarized response.
Workaround
We can read the raw JSON string and manually parse it.
new ObjectMapper().readValue(response.transcription().get().text(), TranscriptionDiarized.class)Possible cause
From what I can tell from a little bit of debugging, the issue might be here in the AudioResponseFormat#isJson function, where a case for DIARIZED_JSON is missing. As a result, the parser considers the response to be plain text.
openai-java/openai-java-core/src/main/kotlin/com/openai/models/audio/AudioResponseFormat.kt
Lines 155 to 162 in 5729c58
when (this) { JSON -> true TEXT -> false SRT -> false VERBOSE_JSON -> true VTT -> false else -> false }
Example
An example input/output where I observed the issue:
TranscriptionCreateParams{body=Body{file=MultipartField{value=sun.nio.ch.ChannelInputStream@967d60f, contentType=audio/mpeg, filename=sousei_no_onmyouji_short.mp3}, model=MultipartField{value=gpt-4o-transcribe-diarize, contentType=text/plain; charset=utf-8, filename=null}, chunkingStrategy=MultipartField{value=ChunkingStrategy{auto=auto}, contentType=text/plain; charset=utf-8, filename=null}, include=MultipartField{value=null, contentType=text/plain; charset=utf-8, filename=null}, knownSpeakerNames=MultipartField{value=null, contentType=text/plain; charset=utf-8, filename=null}, knownSpeakerReferences=MultipartField{value=null, contentType=text/plain; charset=utf-8, filename=null}, language=MultipartField{value=ja, contentType=text/plain; charset=utf-8, filename=null}, prompt=MultipartField{value=null, contentType=text/plain; charset=utf-8, filename=null}, responseFormat=MultipartField{value=diarized_json, contentType=text/plain; charset=utf-8, filename=null}, temperature=MultipartField{value=null, contentType=text/plain; charset=utf-8, filename=null}, timestampGranularities=MultipartField{value=null, contentType=text/plain; charset=utf-8, filename=null}, additionalProperties={}}, additionalHeaders=Headers{map={}}, additionalQueryParams=QueryParams{map={}}}
This results in the following. Note that the text field of transcription contains the entire JSON string, but diarized is missing / null.
TranscriptionCreateResponse{transcription=Transcription{text={"text":"彼女の名はアダ シノベリオ 強力な怨霊を排出 してきた京都の名家ア ダシノ家の筆頭","segments":[{"type":"transcript.text.segment","text":"彼女の名はアダシノベリオ","speaker":"A","start":1.0000000000000002,"end":3.3,"id":"seg_0"},{"type":"transcript.text.segment","text":"強力な怨霊を排出してきた京都の名家アダシノ家の筆頭","speaker":"A","start":3.8,"end":9.4,"id":"seg_1"}],"usage":{"type":"tokens","total_tokens":405,"input_tokens":97,"input_token_details":{"text_tokens":0,"audio_tokens":97},"output_tokens":308}}, logprobs=, usage=, additionalProperties={}}}
Remark
The JSON data itself seems correct, when I try to parse the raw JSON manually into an instance of TranscriptionDiarized, it works:
