Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option for autoplay in Audio component #1349

Closed
1 task done
versae opened this issue May 20, 2022 · 20 comments · Fixed by #4453
Closed
1 task done

Add option for autoplay in Audio component #1349

versae opened this issue May 20, 2022 · 20 comments · Fixed by #4453
Assignees
Labels
enhancement New feature or request good first issue Good for newcomers svelte Frontend-related issue (JS)
Milestone

Comments

@versae
Copy link

versae commented May 20, 2022

  • I have searched to see if a similar issue already exists.

Is your feature request related to a problem? Please describe.
I'd like to be able to auto play a sound when it comes back from the inference.

Describe the solution you'd like
An new attribute, False by default, in the Audio component to enable/disable autoplay of the HTML audio element.

@abidlabs abidlabs added the enhancement New feature or request label May 21, 2022
@abidlabs abidlabs added good first issue Good for newcomers svelte Frontend-related issue (JS) labels Jul 11, 2022
@abidlabs
Copy link
Member

Seems reasonable to me. I think this would make sense as a parameter in the "style()" function of the "Audio()" component.

@pngwn
Copy link
Member

pngwn commented Nov 24, 2022

Auto playing audio is not allowed in chrome, although works in other browsers. Auto playing videos work in all browsers as long as the video is muted.

There are some workarounds but they are pretty heavy and can be inconsistent.

Will see if we can any other options here since there have already been user interactions.

@emilyuhde
Copy link
Contributor

I would recommend against using any work around to enforce auto play and to use the HTML autoplay attribute instead and just let the browsers do what they will do. Audio and video autoplay can cause accessibility issues, especially for users who are using screen readers. If you use autoplay and ensure there is a way for users to pause audio and video, it shouldn’t cause any a11y issues.

http://www.w3.org/TR/WCAG20-TECHS/F93.html

@pngwn
Copy link
Member

pngwn commented Dec 8, 2022

In most cases, it is a slightly different case to normal autoplaying videos. It is similar to a user uploading a video via a file input and that video then autoplaying (as opposed to autoplaying on page load) but the main issue still stands. We would need to focus an appropriate part of the screen (possibly the video elements itself).

On balance, I think autoplaying with the video muted (chrome default) is probably the best approach if we decide to support this.

I think this would make sense as a parameter in the "style()" function of the "Audio()" component.

I don't agree with this though, this isn't a stylistic choice but changes the behaviour of the UI. It should be a kwarg and updatable via gr.Update imo.

@pi43r
Copy link

pi43r commented Dec 14, 2022

I needed this for a small internal demo, where we wanted to output text-to-speech without the user pressing an additional button. My workaround looks like this:

    audio_el = gr.Audio(type="numpy", elem_id="speaker")
    autoplay_audio = """async () => {
        console.log('playing audio in 2 seconds')
        let gradioEl = document.querySelector('body > gradio-app').shadowRoot;
        setTimeout(() => {
            let audioplayer = gradioEl.querySelector('#speaker > audio');
            audioplayer.play();
        }, 2000)
        }"""
    transcription.change(fn=update_audio, inputs=state, outputs=audio_el, _js=autoplay_audio)

It works fine, but obviously the 2 seconds delay are hard coded because this is how long our system approximately needs to display the audio player. Is there a way to get the state of the audio component?

@Arcadia822
Copy link

I needed this for a small internal demo, where we wanted to output text-to-speech without the user pressing an additional button. My workaround looks like this:

    audio_el = gr.Audio(type="numpy", elem_id="speaker")
    autoplay_audio = """async () => {
        console.log('playing audio in 2 seconds')
        let gradioEl = document.querySelector('body > gradio-app').shadowRoot;
        setTimeout(() => {
            let audioplayer = gradioEl.querySelector('#speaker > audio');
            audioplayer.play();
        }, 2000)
        }"""
    transcription.change(fn=update_audio, inputs=state, outputs=audio_el, _js=autoplay_audio)

It works fine, but obviously the 2 seconds delay are hard coded because this is how long our system approximately needs to display the audio player. Is there a way to get the state of the audio component?

I'm in same satuation to showcase an ai chat demo with TTS feature and no user.

@abidlabs
Copy link
Member

abidlabs commented Mar 5, 2023

Agreeed. It would be great to enable this, as well as https://github.com/gradio-app/gradio/discussions/3316. cc @aliabid94

@tszumowski
Copy link

@pi43r and/or @Arcadia822 it sounds like you got a workaround running? @pi43r thank you for the code snippet. I unfortunately am having trouble getting it to work. I either end up with a None input, or I just don't see any of the javascript running. Did you have to configure your browser, or use a specific browser, to get it working? Can you describe what update_audio and state is in that code so I can understand what is feeding into audio_el?

@pi43r
Copy link

pi43r commented Mar 10, 2023

@pi43r and/or @Arcadia822 it sounds like you got a workaround running? @pi43r thank you for the code snippet. I unfortunately am having trouble getting it to work. I either end up with a None input, or I just don't see any of the javascript running. Did you have to configure your browser, or use a specific browser, to get it working? Can you describe what update_audio and state is in that code so I can understand what is feeding into audio_el?

It's been a while since I wrote the code and gradio might have changed. It should work in any browser (maybe not in safari because apple is different).
update_audio in our case was a simple function that generates the audio and converts it to a numpy array, then places it in the state list.
https://gradio.app/interface-state/

I might be able to make a minimal reproduction in the future on huggingface...

@robjm16
Copy link

robjm16 commented Mar 10, 2023

Strictly as a quick workaround, when I launched the interface of a chatbot I was playing with, I set debug=True and the audio played as the text was returned.

An audio player box popped up below the gradio input/output boxes, but the audio played automatically.

I also suppressed all warnings, to keep them from displaying.

@tszumowski
Copy link

@robjm16

Strictly as a quick workaround, when I launched the interface of a chatbot I was playing with, I set debug=True and the audio played as the text was returned.

Thanks for the tip. I tried debug=True in both the launch function and in the audio block. No luck on my end. Though it may be because I'm using blocks instead? Where did you place your debug=True?

@robjm16
Copy link

robjm16 commented Mar 14, 2023 via email

@robjm16
Copy link

robjm16 commented Mar 14, 2023

I was trying to get this voice-enabled chatbot to work (code below). Wasn't using gradio blocks.
Most of the code taken from:
https://github.com/hackingthemarkets/chatgpt-api-whisper-api-voice-assistant/blob/main/therapist.py

import openai
import gradio as gr
import gtts
from playsound import playsound
from gtts import gTTS #Import Google Text to Speech
from IPython.display import Audio #Import Audio method from IPython's Display Class

import warnings
warnings.filterwarnings("ignore")
openai.api_key = 'YOUR KEY HERE"

Create list of messages, starting with initial message to the system

messages = [{"role": "system", "content": 'You are a therapist. Respond to all input in 25 words or less.'}]

def transcribe(audio):
"""
Transcribes the user's audio input using the OpenAI API,
generates a response from the chatbot using GPT-3, converts the response into
speech using the gTTS library, updates the conversation history, and returns
the updated conversation history as a string.

Parameters:
audio (str): The filepath of the audio file containing the user's input.

Returns:
str: A string containing the updated conversation history, with each message formatted as "role: content" and separated by two newlines.
"""
# Declare messages a global variable (not local to the function)
global messages

# Get user's audio, transcribe it and append it to messages 
audio_file = open(audio, "rb") #"open" is built-in Python command
transcript = openai.Audio.transcribe("whisper-1", audio_file)
messages.append({"role": "user", "content": transcript["text"]})

# Get the therapist's response, append to messages 
response = openai.ChatCompletion.create(model="gpt-3.5-turbo", messages=messages)
system_message = response["choices"][0]["message"]
messages.append(system_message)

# Create audio from therapist's text response   
msg=system_message["content"]  
# print(msg) # For validation 
talk_file=make_into_speech(msg)
display(Audio(talk_file, autoplay=True)) 

# Update the rolling chat transcript 
chat_transcript = ""
for message in messages:
    if message['role'] != 'system':
        chat_transcript += message['role'] + ": " + message['content'] + "\n\n"

return chat_transcript

def make_into_speech(words):
"""
Takes a string as input, converts it to speech using the gTTS library,
saves the speech as a WAV file, and returns the filepath of the saved WAV file.
Parameters:
- words (str): The input string to convert to speech.
Returns:
- sound_file (str): The filepath of the saved WAV file.
Example:
>>> make_into_speech('Hello, how are you today?')
'2.wav'
The function converts the input string to speech and returns the filepath of the saved WAV file.
"""
tts = gTTS(words) #Provide the string to convert to speech
tts.save('2.wav') #Save the string converted to speech as a .wav file
sound_file = '2.wav'
return sound_file

Launch the interface

ui = gr.Interface(fn=transcribe, inputs=gr.Audio(source="microphone", type="filepath", label="Record Here"), outputs=[gr.Text(label="Chat Transcript")])
ui.launch(debug=True)

@abidlabs abidlabs added this to the 3.x milestone Mar 17, 2023
@tszumowski
Copy link

Thanks again for the additional info. I was able to get it to work for my use case with the usage of _js like described above, but a slightly different javascript string:
https://github.com/tszumowski/vocaltales_storyteller_chatbot/blob/46d46799ff8cdff2f016e484b0ade0e14cb12f8a/storyteller.py#L236-L242

    autoplay_audio = """
            async () => {{
                setTimeout(() => {{
                    document.querySelector('#speaker audio').play();
                }}, {speech_delay});
            }}
        """

@CsqTom
Copy link

CsqTom commented Mar 30, 2023

I now use this method to add autoplay, which can be referred to below:

import gradio as gr
from gtts import gTTS
from io import BytesIO
import base64


def text_to_speech(text):
    tts = gTTS(text)
    tts.save('hello_world.mp3')

    audio_bytes = BytesIO()
    tts.write_to_fp(audio_bytes)
    audio_bytes.seek(0)

    audio = base64.b64encode(audio_bytes.read()).decode("utf-8")
    audio_player = f'<audio src="data:audio/mpeg;base64,{audio}" controls autoplay></audio>'

    return audio_player


with gr.Blocks() as demo:
    html = gr.HTML()
    # html.visible = False

    text = gr.Text()
    btn = gr.Button("OK")
    btn.click(text_to_speech, inputs=[text], outputs=[html])

demo.launch()

@robjm16
Copy link

robjm16 commented Mar 30, 2023 via email

@NowLoadY
Copy link

NowLoadY commented Apr 4, 2023

I now use this method to add autoplay, which can be referred to below:

import gradio as gr
from gtts import gTTS
from io import BytesIO
import base64


def text_to_speech(text):
    tts = gTTS(text)
    tts.save('hello_world.mp3')

    audio_bytes = BytesIO()
    tts.write_to_fp(audio_bytes)
    audio_bytes.seek(0)

    audio = base64.b64encode(audio_bytes.read()).decode("utf-8")
    audio_player = f'<audio src="data:audio/mpeg;base64,{audio}" controls autoplay></audio>'

    return audio_player


with gr.Blocks() as demo:
    html = gr.HTML()
    # html.visible = False

    text = gr.Text()
    btn = gr.Button("OK")
    btn.click(text_to_speech, inputs=[text], outputs=[html])

demo.launch()

Thank you! It works!
and I am using .wav, so I got this:

def audio_to_html(audio):
    audio_bytes = BytesIO()
    wavio.write(audio_bytes, audio[1].astype(np.float32), audio[0], sampwidth=4)
    audio_bytes.seek(0)

    audio_base64 = base64.b64encode(audio_bytes.read()).decode("utf-8")
    audio_player = f'<audio src="data:audio/mpeg;base64,{audio_base64}" controls autoplay></audio>'

    return audio_player

and adding:

import wavio

@asiffarhankhan
Copy link

Thanks the above solution worked for me, For a more beginner friendly description of what's going on:

  • You have an audio, Consider it to be in bytes (or convert it to bytes)
  • Convert it to bytesIO using audio_io=BtesIO(audio_in_bytes) (Library: from io import BytesIO)
  • Point it to the beginning of the audio file using audio_io.seek(0)
  • Convert the audio_io to base64 to be used in html command using audio_base64 = base64.b64encode(audio_io.read()).decode("utf-8") (Library import base64)
  • Finally, write the html command as string that invokes autoplay for your audio as audio_html = f'<audio src="data:audio/mpeg;base64,{audio_base64}" controls autoplay></audio>'

Use the audio_html as your output in gradio to auto play that converted audio. This can be altered based on your usage

@tszumowski
Copy link

@CsqTom

I now use this method to add autoplay, which can be referred to below:

Thank you! Your code worked perfectly when I integrated it into my codebase, and cleaned it up a lot too! I appreciate the reproducible example you provided as well!

@versae
Copy link
Author

versae commented Jun 9, 2023

🙌🏼

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers svelte Frontend-related issue (JS)
Projects
None yet
Development

Successfully merging a pull request may close this issue.