Whisper OpenAI - Microphone Audio Input #3479

MichaelMares · 2023-03-16T12:52:58Z

Describe the bug

Hey, I noticed this from a couple of days ago. The microphone input (as audio) is now throwing an error with the OpenAI Whisper model. What used to work fine now requires an additional step. What needs to be done now is to convert the input into a .wav format. Not sure wether it is a Gradio bug or OpenAI changing the API but I thought I'd share my findings

Is there an existing issue for this?

I have searched the existing issues

Reproduction

Original code that no longer works:

#Function that takes in the audio as an input and returns the transcript
def transcribe(audio):
global messages

audio_file = open(audio, "rb")
transcript = openai.Audio.transcribe("whisper-1", audio_file)
print(transcript)

...
ui = gr.Interface(
fn=transcribe,
inputs=gr.Audio(source="microphone", type="filepath", label = "What's on your mind?"),...

New function that works:

def transcribe_audio(audio_file):
# Load the audio file and convert it to .wav format
audio_segment = AudioSegment.from_file(audio_file).export("converted_audio.wav", format="wav")

with open("converted_audio.wav", "rb") as audio:
    response = openai.Audio.transcribe('whisper-1', audio)
    
return response["text"]

Screenshot

No response

Logs

Colab notebook detected. This cell will run indefinitely so that you can see errors and logs. To turn off, set debug=False in launch().
Note: opening Chrome Inspector may crash demo inside Colab notebooks.

To create a public link, set `share=True` in `launch()`.
/usr/local/lib/python3.9/dist-packages/gradio/processing_utils.py:239: UserWarning: Trying to convert audio automatically from int32 to 16-bit int format.
  warnings.warn(warning.format(data.dtype))
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/dist-packages/gradio/routes.py", line 393, in run_predict
    output = await app.get_blocks().process_api(
  File "/usr/local/lib/python3.9/dist-packages/gradio/blocks.py", line 1059, in process_api
    result = await self.call_function(
  File "/usr/local/lib/python3.9/dist-packages/gradio/blocks.py", line 868, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "/usr/local/lib/python3.9/dist-packages/anyio/to_thread.py", line 31, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "/usr/local/lib/python3.9/dist-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
    return await future
  File "/usr/local/lib/python3.9/dist-packages/anyio/_backends/_asyncio.py", line 867, in run
    result = context.run(func, *args)
  File "<ipython-input-5-ccc1b109e290>", line 50, in transcribe
    transcript = openai.Audio.transcribe("whisper-1", audio_file)
  File "/usr/local/lib/python3.9/dist-packages/openai/api_resources/audio.py", line 57, in transcribe
    response, _, api_key = requestor.request("post", url, files=files, params=data)
  File "/usr/local/lib/python3.9/dist-packages/openai/api_requestor.py", line 226, in request
    resp, got_stream = self._interpret_response(result, stream)
  File "/usr/local/lib/python3.9/dist-packages/openai/api_requestor.py", line 619, in _interpret_response
    self._interpret_response_line(
  File "/usr/local/lib/python3.9/dist-packages/openai/api_requestor.py", line 682, in _interpret_response_line
    raise self.handle_error_response(
openai.error.InvalidRequestError: Invalid file format. Supported formats: ['m4a', 'mp3', 'webm', 'mp4', 'mpga', 'wav', 'mpeg']
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/dist-packages/gradio/routes.py", line 393, in run_predict
    output = await app.get_blocks().process_api(
  File "/usr/local/lib/python3.9/dist-packages/gradio/blocks.py", line 1059, in process_api
    result = await self.call_function(
  File "/usr/local/lib/python3.9/dist-packages/gradio/blocks.py", line 868, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "/usr/local/lib/python3.9/dist-packages/anyio/to_thread.py", line 31, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "/usr/local/lib/python3.9/dist-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
    return await future
  File "/usr/local/lib/python3.9/dist-packages/anyio/_backends/_asyncio.py", line 867, in run
    result = context.run(func, *args)
  File "<ipython-input-5-ccc1b109e290>", line 49, in transcribe
    audio_file = open(audio, "rb")
TypeError: expected str, bytes or os.PathLike object, not NoneType

System Info

Used in Google Colab.

Severity

annoying

The text was updated successfully, but these errors were encountered:

enth0 · 2023-03-21T11:41:20Z

Had the same issue today, fixed it just by appending ".wav" to the name of the temporary file generated by gradio .

def transcribe(audio):
    os.rename(audio, audio + '.wav')
    file = open(audio + '.wav', "rb")
    return openai.Audio.transcribe("whisper-1", file).text

azmek · 2023-03-22T14:22:54Z

That's perfect solution enth0,
makes perfect sense to control the extension of the file and works great
Thank you

YOlOKY · 2023-04-07T19:08:49Z

with open(WAVE_OUTPUT_FILENAME, 'rb') as audio_file:
    response = openai.Audio.transcribe("whisper-1", audio_file)
    print('录入音频成功')

'''audio_file = open("text.mp3", "rb")
response = openai.Audio.transcribe("whisper-1", audio_file)
print('转换成功')'''
# Extract transcription result
text = response["text"]

Traceback (most recent call last):
File "F:\AI\openai\GPT_Voice\test_text.py", line 102, in
text = transcribe_audio()
File "F:\AI\openai\GPT_Voice\test_text.py", line 59, in transcribe_audio
response = openai.Audio.transcribe("whisper-1", audio_file)

how to solve this problem？Does the whisper-1 model need to be downloaded locally？

Luno-helloworld · 2023-08-06T21:55:11Z

def transcribe(audio):
os.rename(audio, audio + '.wav')
file = open(audio + '.wav', "rb")
return openai.Audio.transcribe("whisper-1", file).text

Why the error message is: line 22, in transcript
os.rename(audio, audio + '.wav')
~~~~~~^~~~~~~~
TypeError: can only concatenate tuple (not "str") to tuple

MichaelMares added the bug Something isn't working label Mar 16, 2023

abidlabs added this to the 3.x milestone Mar 17, 2023

eervin123 mentioned this issue Mar 30, 2023

Problem with wrong audio type hackingthemarkets/chatgpt-api-whisper-api-voice-assistant#4

Closed

abidlabs mentioned this issue Apr 5, 2023

Fix for default file name of recorded audio files #3770

Merged

abidlabs closed this as completed in #3770 Apr 5, 2023

asomervell mentioned this issue Apr 24, 2023

OpenAI's servers don't seem to like audio files from iOS IgnoranceAI/hugh#4

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Whisper OpenAI - Microphone Audio Input #3479

Whisper OpenAI - Microphone Audio Input #3479

MichaelMares commented Mar 16, 2023

enth0 commented Mar 21, 2023

azmek commented Mar 22, 2023

YOlOKY commented Apr 7, 2023

Luno-helloworld commented Aug 6, 2023

Whisper OpenAI - Microphone Audio Input #3479

Whisper OpenAI - Microphone Audio Input #3479

Comments

MichaelMares commented Mar 16, 2023

Describe the bug

Is there an existing issue for this?

Reproduction

Screenshot

Logs

System Info

Severity

enth0 commented Mar 21, 2023

azmek commented Mar 22, 2023

YOlOKY commented Apr 7, 2023

Luno-helloworld commented Aug 6, 2023