Describe the bug
openai.Audio.transcribe(model= 'whisper-1',file = wavfile, options = {"language" : "en",})
openai whisper automatically transcribes and translates input voice to a different language. Specifying the language parameter does nothing.
Nor does adding a prefix of [en] solve the problem. It appears that accents and environmental noise is capable of influencing the language inference model.
From the openai.Audio.transcribe() function, accepted parameters are 'model', 'file', 'apikey', 'apibase', 'apitype', 'apiversion', 'organization', and 'kwargs'. There is no parameter to set the language.
Would be nice for a parameter to force the ASR to a language if the auto-inference model is not reliable.
To Reproduce
tmp.write(audio.get_wav_data())
tmp.seek(0) # Reset file pointer to beginning
whisp = openai.Audio.transcribe(model= 'whisper-1',file =tmp, options = {"language" : "en",})
user_input = "[en] " + whisp['text']
print(f'\nYou said:\n"{user_input}"')
messages.append( {"role": "user", "content": user_input} )
print("\nResponding...\n")
completion = openai.ChatCompletion.create(
model=MODEL,
messages=messages,
temperature=0.8
)
created = str( get_timestamp_from_unix( completion['created'] ) )
response = completion['choices'][0].message.content
messages.append({"role": "assistant", "content": f'Created: {created}\n{response}'})
print(f"\n{created}\n{response}\n")
You said:
"[en] Jadi, perkara ini masih tidak berfungsi. Saya telah menambahkan sufis bahasa Inggeris dan untuk sebab tertentu, ia masih memberi balik transkripsi yang diterjemahkan oleh Malaysia. Jadi, saya yakin ada masalah dengan masalah transkripsi dari
OpenAI Whisper."
Responding...
2023-07-03 15:31:47
Created: 2023-07-03 15:31:42
Maaf atas ketidaknyamanannya. Terkadang, ada kesalahan dalam transkripsi bahasa yang dapat menyebabkan masalah seperti yang
Anda alami. Saya akan mencatat masalah ini dan melaporkannya kepada tim teknis untuk diperiksa. Apakah ada pertanyaan lain yang dapat saya bantu jawab?
Code snippets
No response
OS
Windows
Python version
Python 3.8.3
Library version
openai==0.27.7
Describe the bug
openai.Audio.transcribe(model= 'whisper-1',file = wavfile, options = {"language" : "en",})openai whisper automatically transcribes and translates input voice to a different language. Specifying the language parameter does nothing.
Nor does adding a prefix of [en] solve the problem. It appears that accents and environmental noise is capable of influencing the language inference model.
From the
openai.Audio.transcribe()function, accepted parameters are'model','file','apikey','apibase','apitype','apiversion','organization', and'kwargs'. There is no parameter to set the language.Would be nice for a parameter to force the ASR to a language if the auto-inference model is not reliable.
To Reproduce
Code snippets
No response
OS
Windows
Python version
Python 3.8.3
Library version
openai==0.27.7