Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Whisper OpenAI - Microphone Audio Input #3479

Closed
1 task done
MichaelMares opened this issue Mar 16, 2023 · 4 comments · Fixed by #3770
Closed
1 task done

Whisper OpenAI - Microphone Audio Input #3479

MichaelMares opened this issue Mar 16, 2023 · 4 comments · Fixed by #3770
Labels
bug Something isn't working
Milestone

Comments

@MichaelMares
Copy link

Describe the bug

Hey, I noticed this from a couple of days ago. The microphone input (as audio) is now throwing an error with the OpenAI Whisper model. What used to work fine now requires an additional step. What needs to be done now is to convert the input into a .wav format. Not sure wether it is a Gradio bug or OpenAI changing the API but I thought I'd share my findings

Is there an existing issue for this?

  • I have searched the existing issues

Reproduction

Original code that no longer works:

#Function that takes in the audio as an input and returns the transcript
def transcribe(audio):
global messages

audio_file = open(audio, "rb")
transcript = openai.Audio.transcribe("whisper-1", audio_file)
print(transcript)

...
ui = gr.Interface(
fn=transcribe,
inputs=gr.Audio(source="microphone", type="filepath", label = "What's on your mind?"),...

New function that works:

def transcribe_audio(audio_file):
# Load the audio file and convert it to .wav format
audio_segment = AudioSegment.from_file(audio_file).export("converted_audio.wav", format="wav")

with open("converted_audio.wav", "rb") as audio:
    response = openai.Audio.transcribe('whisper-1', audio)
    
return response["text"]

Screenshot

No response

Logs

Colab notebook detected. This cell will run indefinitely so that you can see errors and logs. To turn off, set debug=False in launch().
Note: opening Chrome Inspector may crash demo inside Colab notebooks.

To create a public link, set `share=True` in `launch()`.
/usr/local/lib/python3.9/dist-packages/gradio/processing_utils.py:239: UserWarning: Trying to convert audio automatically from int32 to 16-bit int format.
  warnings.warn(warning.format(data.dtype))
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/dist-packages/gradio/routes.py", line 393, in run_predict
    output = await app.get_blocks().process_api(
  File "/usr/local/lib/python3.9/dist-packages/gradio/blocks.py", line 1059, in process_api
    result = await self.call_function(
  File "/usr/local/lib/python3.9/dist-packages/gradio/blocks.py", line 868, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "/usr/local/lib/python3.9/dist-packages/anyio/to_thread.py", line 31, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "/usr/local/lib/python3.9/dist-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
    return await future
  File "/usr/local/lib/python3.9/dist-packages/anyio/_backends/_asyncio.py", line 867, in run
    result = context.run(func, *args)
  File "<ipython-input-5-ccc1b109e290>", line 50, in transcribe
    transcript = openai.Audio.transcribe("whisper-1", audio_file)
  File "/usr/local/lib/python3.9/dist-packages/openai/api_resources/audio.py", line 57, in transcribe
    response, _, api_key = requestor.request("post", url, files=files, params=data)
  File "/usr/local/lib/python3.9/dist-packages/openai/api_requestor.py", line 226, in request
    resp, got_stream = self._interpret_response(result, stream)
  File "/usr/local/lib/python3.9/dist-packages/openai/api_requestor.py", line 619, in _interpret_response
    self._interpret_response_line(
  File "/usr/local/lib/python3.9/dist-packages/openai/api_requestor.py", line 682, in _interpret_response_line
    raise self.handle_error_response(
openai.error.InvalidRequestError: Invalid file format. Supported formats: ['m4a', 'mp3', 'webm', 'mp4', 'mpga', 'wav', 'mpeg']
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/dist-packages/gradio/routes.py", line 393, in run_predict
    output = await app.get_blocks().process_api(
  File "/usr/local/lib/python3.9/dist-packages/gradio/blocks.py", line 1059, in process_api
    result = await self.call_function(
  File "/usr/local/lib/python3.9/dist-packages/gradio/blocks.py", line 868, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "/usr/local/lib/python3.9/dist-packages/anyio/to_thread.py", line 31, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "/usr/local/lib/python3.9/dist-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
    return await future
  File "/usr/local/lib/python3.9/dist-packages/anyio/_backends/_asyncio.py", line 867, in run
    result = context.run(func, *args)
  File "<ipython-input-5-ccc1b109e290>", line 49, in transcribe
    audio_file = open(audio, "rb")
TypeError: expected str, bytes or os.PathLike object, not NoneType

System Info

Used in Google Colab.

Severity

annoying

@MichaelMares MichaelMares added the bug Something isn't working label Mar 16, 2023
@abidlabs abidlabs added this to the 3.x milestone Mar 17, 2023
@enth0
Copy link

enth0 commented Mar 21, 2023

Had the same issue today, fixed it just by appending ".wav" to the name of the temporary file generated by gradio .

def transcribe(audio):
    os.rename(audio, audio + '.wav')
    file = open(audio + '.wav', "rb")
    return openai.Audio.transcribe("whisper-1", file).text

@azmek
Copy link

azmek commented Mar 22, 2023

That's perfect solution enth0,
makes perfect sense to control the extension of the file and works great
Thank you

@YOlOKY
Copy link

YOlOKY commented Apr 7, 2023

with open(WAVE_OUTPUT_FILENAME, 'rb') as audio_file:
    response = openai.Audio.transcribe("whisper-1", audio_file)
    print('录入音频成功')

'''audio_file = open("text.mp3", "rb")
response = openai.Audio.transcribe("whisper-1", audio_file)
print('转换成功')'''
# Extract transcription result
text = response["text"]

Traceback (most recent call last):
File "F:\AI\openai\GPT_Voice\test_text.py", line 102, in
text = transcribe_audio()
File "F:\AI\openai\GPT_Voice\test_text.py", line 59, in transcribe_audio
response = openai.Audio.transcribe("whisper-1", audio_file)

how to solve this problem?Does the whisper-1 model need to be downloaded locally?

@Luno-helloworld
Copy link

def transcribe(audio):
os.rename(audio, audio + '.wav')
file = open(audio + '.wav', "rb")
return openai.Audio.transcribe("whisper-1", file).text

Why the error message is: line 22, in transcript
os.rename(audio, audio + '.wav')
~~~~~~^~~~~~~~
TypeError: can only concatenate tuple (not "str") to tuple

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants