Add option for `autoplay` in Audio component #1349

versae · 2022-05-20T14:14:27Z

I have searched to see if a similar issue already exists.

Is your feature request related to a problem? Please describe.
I'd like to be able to auto play a sound when it comes back from the inference.

Describe the solution you'd like
An new attribute, False by default, in the Audio component to enable/disable autoplay of the HTML audio element.

The text was updated successfully, but these errors were encountered:

abidlabs · 2022-07-11T09:21:24Z

Seems reasonable to me. I think this would make sense as a parameter in the "style()" function of the "Audio()" component.

pngwn · 2022-11-24T10:08:35Z

Auto playing audio is not allowed in chrome, although works in other browsers. Auto playing videos work in all browsers as long as the video is muted.

There are some workarounds but they are pretty heavy and can be inconsistent.

Will see if we can any other options here since there have already been user interactions.

emilyuhde · 2022-12-08T02:42:05Z

I would recommend against using any work around to enforce auto play and to use the HTML autoplay attribute instead and just let the browsers do what they will do. Audio and video autoplay can cause accessibility issues, especially for users who are using screen readers. If you use autoplay and ensure there is a way for users to pause audio and video, it shouldn’t cause any a11y issues.

http://www.w3.org/TR/WCAG20-TECHS/F93.html

pngwn · 2022-12-08T14:51:58Z

In most cases, it is a slightly different case to normal autoplaying videos. It is similar to a user uploading a video via a file input and that video then autoplaying (as opposed to autoplaying on page load) but the main issue still stands. We would need to focus an appropriate part of the screen (possibly the video elements itself).

On balance, I think autoplaying with the video muted (chrome default) is probably the best approach if we decide to support this.

I think this would make sense as a parameter in the "style()" function of the "Audio()" component.

I don't agree with this though, this isn't a stylistic choice but changes the behaviour of the UI. It should be a kwarg and updatable via gr.Update imo.

pi43r · 2022-12-14T16:23:45Z

I needed this for a small internal demo, where we wanted to output text-to-speech without the user pressing an additional button. My workaround looks like this:

    audio_el = gr.Audio(type="numpy", elem_id="speaker")
    autoplay_audio = """async () => {
        console.log('playing audio in 2 seconds')
        let gradioEl = document.querySelector('body > gradio-app').shadowRoot;
        setTimeout(() => {
            let audioplayer = gradioEl.querySelector('#speaker > audio');
            audioplayer.play();
        }, 2000)
        }"""
    transcription.change(fn=update_audio, inputs=state, outputs=audio_el, _js=autoplay_audio)

It works fine, but obviously the 2 seconds delay are hard coded because this is how long our system approximately needs to display the audio player. Is there a way to get the state of the audio component?

Arcadia822 · 2023-03-05T16:38:00Z

I needed this for a small internal demo, where we wanted to output text-to-speech without the user pressing an additional button. My workaround looks like this:
    audio_el = gr.Audio(type="numpy", elem_id="speaker")
    autoplay_audio = """async () => {
        console.log('playing audio in 2 seconds')
        let gradioEl = document.querySelector('body > gradio-app').shadowRoot;
        setTimeout(() => {
            let audioplayer = gradioEl.querySelector('#speaker > audio');
            audioplayer.play();
        }, 2000)
        }"""
    transcription.change(fn=update_audio, inputs=state, outputs=audio_el, _js=autoplay_audio)
It works fine, but obviously the 2 seconds delay are hard coded because this is how long our system approximately needs to display the audio player. Is there a way to get the state of the audio component?

I'm in same satuation to showcase an ai chat demo with TTS feature and no user.

abidlabs · 2023-03-05T20:34:25Z

Agreeed. It would be great to enable this, as well as https://github.com/gradio-app/gradio/discussions/3316. cc @aliabid94

tszumowski · 2023-03-09T02:57:56Z

@pi43r and/or @Arcadia822 it sounds like you got a workaround running? @pi43r thank you for the code snippet. I unfortunately am having trouble getting it to work. I either end up with a None input, or I just don't see any of the javascript running. Did you have to configure your browser, or use a specific browser, to get it working? Can you describe what update_audio and state is in that code so I can understand what is feeding into audio_el?

pi43r · 2023-03-10T15:42:30Z

@pi43r and/or @Arcadia822 it sounds like you got a workaround running? @pi43r thank you for the code snippet. I unfortunately am having trouble getting it to work. I either end up with a None input, or I just don't see any of the javascript running. Did you have to configure your browser, or use a specific browser, to get it working? Can you describe what update_audio and state is in that code so I can understand what is feeding into audio_el?

It's been a while since I wrote the code and gradio might have changed. It should work in any browser (maybe not in safari because apple is different).
update_audio in our case was a simple function that generates the audio and converts it to a numpy array, then places it in the state list.
https://gradio.app/interface-state/

I might be able to make a minimal reproduction in the future on huggingface...

robjm16 · 2023-03-10T15:51:38Z

Strictly as a quick workaround, when I launched the interface of a chatbot I was playing with, I set debug=True and the audio played as the text was returned.

An audio player box popped up below the gradio input/output boxes, but the audio played automatically.

I also suppressed all warnings, to keep them from displaying.

tszumowski · 2023-03-14T00:38:35Z

@robjm16

Strictly as a quick workaround, when I launched the interface of a chatbot I was playing with, I set debug=True and the audio played as the text was returned.

Thanks for the tip. I tried debug=True in both the launch function and in the audio block. No luck on my end. Though it may be because I'm using blocks instead? Where did you place your debug=True?

robjm16 · 2023-03-14T00:49:36Z

in launch function.

…

On Mon, Mar 13, 2023 at 8:38 PM Tom Szumowski ***@***.***> wrote: @robjm16 <https://github.com/robjm16> Strictly as a quick workaround, when I launched the interface of a chatbot I was playing with, I set debug=True and the audio played as the text was returned. Thanks for the tip. I tried debug=True in both the launch function and in the audio block. No luck on my end. Though it may be because I'm using blocks instead? Where did you place your debug=True? — Reply to this email directly, view it on GitHub <#1349 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AGFOR2GNCF2Y5SDAC7UWRU3W364ZNANCNFSM5WPTK5MA> . You are receiving this because you were mentioned.Message ID: ***@***.***>

robjm16 · 2023-03-14T01:00:48Z

I was trying to get this voice-enabled chatbot to work (code below). Wasn't using gradio blocks.
Most of the code taken from:
https://github.com/hackingthemarkets/chatgpt-api-whisper-api-voice-assistant/blob/main/therapist.py

import openai
import gradio as gr
import gtts
from playsound import playsound
from gtts import gTTS #Import Google Text to Speech
from IPython.display import Audio #Import Audio method from IPython's Display Class

import warnings
warnings.filterwarnings("ignore")
openai.api_key = 'YOUR KEY HERE"

Create list of messages, starting with initial message to the system

messages = [{"role": "system", "content": 'You are a therapist. Respond to all input in 25 words or less.'}]

def transcribe(audio):
"""
Transcribes the user's audio input using the OpenAI API,
generates a response from the chatbot using GPT-3, converts the response into
speech using the gTTS library, updates the conversation history, and returns
the updated conversation history as a string.

Parameters:
audio (str): The filepath of the audio file containing the user's input.

Returns:
str: A string containing the updated conversation history, with each message formatted as "role: content" and separated by two newlines.
"""
# Declare messages a global variable (not local to the function)
global messages

# Get user's audio, transcribe it and append it to messages 
audio_file = open(audio, "rb") #"open" is built-in Python command
transcript = openai.Audio.transcribe("whisper-1", audio_file)
messages.append({"role": "user", "content": transcript["text"]})

# Get the therapist's response, append to messages 
response = openai.ChatCompletion.create(model="gpt-3.5-turbo", messages=messages)
system_message = response["choices"][0]["message"]
messages.append(system_message)

# Create audio from therapist's text response   
msg=system_message["content"]  
# print(msg) # For validation 
talk_file=make_into_speech(msg)
display(Audio(talk_file, autoplay=True)) 

# Update the rolling chat transcript 
chat_transcript = ""
for message in messages:
    if message['role'] != 'system':
        chat_transcript += message['role'] + ": " + message['content'] + "\n\n"

return chat_transcript

def make_into_speech(words):
"""
Takes a string as input, converts it to speech using the gTTS library,
saves the speech as a WAV file, and returns the filepath of the saved WAV file.
Parameters:
- words (str): The input string to convert to speech.
Returns:
- sound_file (str): The filepath of the saved WAV file.
Example:
>>> make_into_speech('Hello, how are you today?')
'2.wav'
The function converts the input string to speech and returns the filepath of the saved WAV file.
"""
tts = gTTS(words) #Provide the string to convert to speech
tts.save('2.wav') #Save the string converted to speech as a .wav file
sound_file = '2.wav'
return sound_file

Launch the interface

ui = gr.Interface(fn=transcribe, inputs=gr.Audio(source="microphone", type="filepath", label="Record Here"), outputs=[gr.Text(label="Chat Transcript")])
ui.launch(debug=True)

tszumowski · 2023-03-19T21:39:07Z

Thanks again for the additional info. I was able to get it to work for my use case with the usage of _js like described above, but a slightly different javascript string:
https://github.com/tszumowski/vocaltales_storyteller_chatbot/blob/46d46799ff8cdff2f016e484b0ade0e14cb12f8a/storyteller.py#L236-L242

    autoplay_audio = """
            async () => {{
                setTimeout(() => {{
                    document.querySelector('#speaker audio').play();
                }}, {speech_delay});
            }}
        """

CsqTom · 2023-03-30T09:51:59Z

I now use this method to add autoplay, which can be referred to below：

import gradio as gr
from gtts import gTTS
from io import BytesIO
import base64


def text_to_speech(text):
    tts = gTTS(text)
    tts.save('hello_world.mp3')

    audio_bytes = BytesIO()
    tts.write_to_fp(audio_bytes)
    audio_bytes.seek(0)

    audio = base64.b64encode(audio_bytes.read()).decode("utf-8")
    audio_player = f'<audio src="data:audio/mpeg;base64,{audio}" controls autoplay></audio>'

    return audio_player


with gr.Blocks() as demo:
    html = gr.HTML()
    # html.visible = False

    text = gr.Text()
    btn = gr.Button("OK")
    btn.click(text_to_speech, inputs=[text], outputs=[html])

demo.launch()

robjm16 · 2023-03-30T15:06:20Z

Thanks. This worked for me.

…

On Thu, Mar 30, 2023 at 5:52 AM CsqTom ***@***.***> wrote: I now use this method to add autoplay, which can be referred to below： ` import gradio as gr from gtts import gTTS from io import BytesIO import base64 def text_to_speech(text): tts = gTTS(text) tts.save('hello_world.mp3') audio_bytes = BytesIO() tts.write_to_fp(audio_bytes) audio_bytes.seek(0) audio = base64.b64encode(audio_bytes.read()).decode("utf-8") audio_player = f'<audio src="data:audio/mpeg;base64,{audio}" controls autoplay></audio>' return audio_player with gr.Blocks() as demo: html = gr.HTML() # html.visible = False text = gr.Text() btn = gr.Button("OK") btn.click(text_to_speech, inputs=[text], outputs=[html]) demo.launch() ` — Reply to this email directly, view it on GitHub <#1349 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AGFOR2EBUQLGNQ455SJW7B3W6VJUVANCNFSM5WPTK5MA> . You are receiving this because you were mentioned.Message ID: ***@***.***>

NowLoadY · 2023-04-04T18:30:48Z

I now use this method to add autoplay, which can be referred to below：

import gradio as gr
from gtts import gTTS
from io import BytesIO
import base64


def text_to_speech(text):
    tts = gTTS(text)
    tts.save('hello_world.mp3')

    audio_bytes = BytesIO()
    tts.write_to_fp(audio_bytes)
    audio_bytes.seek(0)

    audio = base64.b64encode(audio_bytes.read()).decode("utf-8")
    audio_player = f'<audio src="data:audio/mpeg;base64,{audio}" controls autoplay></audio>'

    return audio_player


with gr.Blocks() as demo:
    html = gr.HTML()
    # html.visible = False

    text = gr.Text()
    btn = gr.Button("OK")
    btn.click(text_to_speech, inputs=[text], outputs=[html])

demo.launch()

Thank you! It works!
and I am using .wav, so I got this:

def audio_to_html(audio):
    audio_bytes = BytesIO()
    wavio.write(audio_bytes, audio[1].astype(np.float32), audio[0], sampwidth=4)
    audio_bytes.seek(0)

    audio_base64 = base64.b64encode(audio_bytes.read()).decode("utf-8")
    audio_player = f'<audio src="data:audio/mpeg;base64,{audio_base64}" controls autoplay></audio>'

    return audio_player

and adding:

import wavio

asiffarhankhan · 2023-04-07T23:05:26Z

Thanks the above solution worked for me, For a more beginner friendly description of what's going on:

You have an audio, Consider it to be in bytes (or convert it to bytes)
Convert it to bytesIO using audio_io=BtesIO(audio_in_bytes) (Library: from io import BytesIO)
Point it to the beginning of the audio file using audio_io.seek(0)
Convert the audio_io to base64 to be used in html command using audio_base64 = base64.b64encode(audio_io.read()).decode("utf-8") (Library import base64)
Finally, write the html command as string that invokes autoplay for your audio as audio_html = f'<audio src="data:audio/mpeg;base64,{audio_base64}" controls autoplay></audio>'

Use the audio_html as your output in gradio to auto play that converted audio. This can be altered based on your usage

tszumowski · 2023-04-10T10:53:17Z

@CsqTom

I now use this method to add autoplay, which can be referred to below：

Thank you! Your code worked perfectly when I integrated it into my codebase, and cleaned it up a lot too! I appreciate the reproducible example you provided as well!

versae · 2023-06-09T07:26:00Z

🙌🏼

abidlabs added the enhancement New feature or request label May 21, 2022

abidlabs added good first issue Good for newcomers svelte Frontend-related issue (JS) labels Jul 11, 2022

abidlabs mentioned this issue Nov 24, 2022

Requesting for autoplay feature in Gradio audio #2715

Closed

1 task

abidlabs added this to the 3.x milestone Mar 17, 2023

tszumowski mentioned this issue Apr 10, 2023

Add audio file to HTML audio player for autoplay functionality tszumowski/vocaltales_storyteller_chatbot#8

Merged

abidlabs assigned pngwn May 9, 2023

pngwn assigned hannahblair May 17, 2023

pngwn mentioned this issue Jun 8, 2023

implement autoplay for Video and Audio #4453

Merged

7 tasks

pngwn closed this as completed in #4453 Jun 8, 2023

pseudotensor mentioned this issue Aug 28, 2023

TTS STT integration h2oai/h2ogpt#775

Closed

dawoodkhan82 mentioned this issue Feb 1, 2024

support Audio IO in Chatbot #2768

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add option for `autoplay` in Audio component #1349

Add option for `autoplay` in Audio component #1349

versae commented May 20, 2022

abidlabs commented Jul 11, 2022

pngwn commented Nov 24, 2022

emilyuhde commented Dec 8, 2022

pngwn commented Dec 8, 2022

pi43r commented Dec 14, 2022

Arcadia822 commented Mar 5, 2023

abidlabs commented Mar 5, 2023

tszumowski commented Mar 9, 2023

pi43r commented Mar 10, 2023

robjm16 commented Mar 10, 2023

tszumowski commented Mar 14, 2023

robjm16 commented Mar 14, 2023 via email

robjm16 commented Mar 14, 2023

tszumowski commented Mar 19, 2023

CsqTom commented Mar 30, 2023 •

edited

robjm16 commented Mar 30, 2023 via email

NowLoadY commented Apr 4, 2023

asiffarhankhan commented Apr 7, 2023

tszumowski commented Apr 10, 2023

versae commented Jun 9, 2023

Add option for autoplay in Audio component #1349

Add option for autoplay in Audio component #1349

Comments

versae commented May 20, 2022

abidlabs commented Jul 11, 2022

pngwn commented Nov 24, 2022

emilyuhde commented Dec 8, 2022

pngwn commented Dec 8, 2022

pi43r commented Dec 14, 2022

Arcadia822 commented Mar 5, 2023

abidlabs commented Mar 5, 2023

tszumowski commented Mar 9, 2023

pi43r commented Mar 10, 2023

robjm16 commented Mar 10, 2023

tszumowski commented Mar 14, 2023

robjm16 commented Mar 14, 2023 via email

robjm16 commented Mar 14, 2023

Create list of messages, starting with initial message to the system

Launch the interface

tszumowski commented Mar 19, 2023

CsqTom commented Mar 30, 2023 • edited

robjm16 commented Mar 30, 2023 via email

NowLoadY commented Apr 4, 2023

asiffarhankhan commented Apr 7, 2023

tszumowski commented Apr 10, 2023

versae commented Jun 9, 2023

Add option for `autoplay` in Audio component #1349

Add option for `autoplay` in Audio component #1349

CsqTom commented Mar 30, 2023 •

edited