support Audio IO in Chatbot #2768

sijunhe · 2022-12-07T05:38:18Z

I have searched to see if a similar issue already exists.

Is your feature request related to a problem? Please describe.
Given that Chatbot now supports images after v3.12.0, I think we should double down on the multi-modality and add Audio IO to the chatbot interface. It will be a great interface for Voice Assistant demos.

Describe the solution you'd like
Gradio already all the building blocks for this feature and just need to put it together. I think something like the WeChat interface will be nice. Inside the chatbot interface, we replace each text with a bar, which is the Audio output and plays the audio when clicked. Then we can put the audio input button somewhere prominent for easy input.

paulocoutinhox · 2023-02-04T18:24:20Z

+1

abidlabs · 2023-02-06T19:13:44Z

cc @dawoodkhan82

rishikeshF · 2023-03-25T23:50:05Z

+1

abidlabs · 2023-03-26T01:51:54Z

Hi @rishikeshF you can already add audio files to your gr.Chatbot as long as you are using the latest version of gradio. Upgrade to the latest version (pip install --upgrade gradio) and then take a look at this section to add media files: https://gradio.app/creating-a-chatbot/#adding-markdown-images-audio-or-videos

jpmcarrilho · 2023-04-05T15:29:00Z

Hello, I'm building a full ai assistant bot on top of gradio (wwd, s2t, information retrial), I have to control many interfaces and components rendering or given outputs. There's no good reactive and fast response on any component, except the chatbot component, that's why I love it. It's very powerful and straightforward usage. But this issue is still not finished I think, we need the capability of recording audio the same way we write a text and press enter to interact now: record with audio, stop recording, press enter, deal with the possible post-process (wav2vec2, whisper), use LLMs and neural search. I think this is the correct implementation. And the field tendency is to add more and more multimodality, write, record, and send an image, as of today's 'XXX GPTs'. This implementation would certainly be a nice addition to the library.

abidlabs · 2023-04-05T18:23:49Z

Hi @jpmcarrilho just to confirm, you'd like to be able to record audio, do some preprocessing and display it in a Chatbot? This is already possible by using the gr.Audio(source="microphone") component

jpmcarrilho · 2023-04-05T18:36:57Z

Yes, sorry for the bad english, I want to be able to record the gr.Audio and click in 'stop recording', press enter and send it directly to the chat, or sendit after the click in stop. I saw the docs but they only cover files upload. Can you provide me a reference of the Audio component being sent to the chat bot as soon as i stop recording?

abidlabs · 2023-04-05T19:00:33Z

No worries @jpmcarrilho. Yes this is currently possible in Gradio, you could do something like this:

import os

def get_chatbot_response(x):
    os.rename(x, x + '.wav')
    return [((x + '.wav',), "Your voice sounds nice!")]

with gr.Blocks() as demo:
    chatbot = gr.Chatbot()
    mic = gr.Audio(source="microphone", type="filepath")
    mic.change(get_chatbot_response, mic, chatbot)
    
demo.launch()

(The os.rename part is because of a bug that we'll be fixing shortly).

Whicih produces this:

jpmcarrilho · 2023-04-05T23:02:45Z

Well, thank you, that is exactly what I needed.

rc-eddy · 2023-06-28T20:51:21Z

@abidlabs is there an option to auto-play chatbot responses?

dawoodkhan82 · 2023-06-28T21:08:06Z

@rc-eddy currently, there isn't. Although I'll look into adding that when I refactor the chatbot.

lehic · 2024-02-01T17:29:55Z

@abidlabs @dawoodkhan82 Have you had a chance to integrate the auto-play feature for the audio in the chatbot responses? If not, is there a workaround that you can recommend?

dawoodkhan82 · 2024-02-01T20:06:23Z

@lehic So auto-playing audio is not supported in most browsers. There are some workarounds you can try in this issue: #1349. But playing audio without user interaction is generally discouraged.

abidlabs added the enhancement New feature or request label Dec 9, 2022

dawoodkhan82 self-assigned this Feb 6, 2023

dawoodkhan82 mentioned this issue Mar 7, 2023

Making the chatbot even more multimodal #3413

Merged

7 tasks

abidlabs closed this as completed in #3413 Mar 13, 2023

abidlabs mentioned this issue Apr 5, 2023

Fix for default file name of recorded audio files #3770

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support Audio IO in Chatbot #2768

support Audio IO in Chatbot #2768

sijunhe commented Dec 7, 2022

paulocoutinhox commented Feb 4, 2023

abidlabs commented Feb 6, 2023

rishikeshF commented Mar 25, 2023

abidlabs commented Mar 26, 2023

jpmcarrilho commented Apr 5, 2023 •

edited

abidlabs commented Apr 5, 2023

jpmcarrilho commented Apr 5, 2023

abidlabs commented Apr 5, 2023

jpmcarrilho commented Apr 5, 2023 •

edited

rc-eddy commented Jun 28, 2023

dawoodkhan82 commented Jun 28, 2023

lehic commented Feb 1, 2024

dawoodkhan82 commented Feb 1, 2024

support Audio IO in Chatbot #2768

support Audio IO in Chatbot #2768

Comments

sijunhe commented Dec 7, 2022

paulocoutinhox commented Feb 4, 2023

abidlabs commented Feb 6, 2023

rishikeshF commented Mar 25, 2023

abidlabs commented Mar 26, 2023

jpmcarrilho commented Apr 5, 2023 • edited

abidlabs commented Apr 5, 2023

jpmcarrilho commented Apr 5, 2023

abidlabs commented Apr 5, 2023

jpmcarrilho commented Apr 5, 2023 • edited

rc-eddy commented Jun 28, 2023

dawoodkhan82 commented Jun 28, 2023

lehic commented Feb 1, 2024

dawoodkhan82 commented Feb 1, 2024

jpmcarrilho commented Apr 5, 2023 •

edited

jpmcarrilho commented Apr 5, 2023 •

edited