Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support Audio IO in Chatbot #2768

Closed
1 task done
sijunhe opened this issue Dec 7, 2022 · 13 comments · Fixed by #3413
Closed
1 task done

support Audio IO in Chatbot #2768

sijunhe opened this issue Dec 7, 2022 · 13 comments · Fixed by #3413
Assignees
Labels
enhancement New feature or request

Comments

@sijunhe
Copy link

sijunhe commented Dec 7, 2022

  • I have searched to see if a similar issue already exists.

Is your feature request related to a problem? Please describe.
Given that Chatbot now supports images after v3.12.0, I think we should double down on the multi-modality and add Audio IO to the chatbot interface. It will be a great interface for Voice Assistant demos.

Describe the solution you'd like
Gradio already all the building blocks for this feature and just need to put it together. I think something like the WeChat interface will be nice. Inside the chatbot interface, we replace each text with a bar, which is the Audio output and plays the audio when clicked. Then we can put the audio input button somewhere prominent for easy input.
image

@abidlabs abidlabs added the enhancement New feature or request label Dec 9, 2022
@paulocoutinhox
Copy link

+1

@abidlabs
Copy link
Member

abidlabs commented Feb 6, 2023

cc @dawoodkhan82

@rishikeshF
Copy link

+1

@abidlabs
Copy link
Member

Hi @rishikeshF you can already add audio files to your gr.Chatbot as long as you are using the latest version of gradio. Upgrade to the latest version (pip install --upgrade gradio) and then take a look at this section to add media files: https://gradio.app/creating-a-chatbot/#adding-markdown-images-audio-or-videos

@jpmcarrilho
Copy link

jpmcarrilho commented Apr 5, 2023

Hello, I'm building a full ai assistant bot on top of gradio (wwd, s2t, information retrial), I have to control many interfaces and components rendering or given outputs. There's no good reactive and fast response on any component, except the chatbot component, that's why I love it. It's very powerful and straightforward usage. But this issue is still not finished I think, we need the capability of recording audio the same way we write a text and press enter to interact now: record with audio, stop recording, press enter, deal with the possible post-process (wav2vec2, whisper), use LLMs and neural search. I think this is the correct implementation. And the field tendency is to add more and more multimodality, write, record, and send an image, as of today's 'XXX GPTs'. This implementation would certainly be a nice addition to the library.

@abidlabs
Copy link
Member

abidlabs commented Apr 5, 2023

Hi @jpmcarrilho just to confirm, you'd like to be able to record audio, do some preprocessing and display it in a Chatbot? This is already possible by using the gr.Audio(source="microphone") component

@jpmcarrilho
Copy link

Yes, sorry for the bad english, I want to be able to record the gr.Audio and click in 'stop recording', press enter and send it directly to the chat, or sendit after the click in stop. I saw the docs but they only cover files upload. Can you provide me a reference of the Audio component being sent to the chat bot as soon as i stop recording?

@abidlabs
Copy link
Member

abidlabs commented Apr 5, 2023

No worries @jpmcarrilho. Yes this is currently possible in Gradio, you could do something like this:

import os

def get_chatbot_response(x):
    os.rename(x, x + '.wav')
    return [((x + '.wav',), "Your voice sounds nice!")]

with gr.Blocks() as demo:
    chatbot = gr.Chatbot()
    mic = gr.Audio(source="microphone", type="filepath")
    mic.change(get_chatbot_response, mic, chatbot)
    
demo.launch()

(The os.rename part is because of a bug that we'll be fixing shortly).

Whicih produces this:

81623eca-dd8f-4d0c-9698-8f84a0224248

@jpmcarrilho
Copy link

jpmcarrilho commented Apr 5, 2023

Well, thank you, that is exactly what I needed.

@rc-eddy
Copy link

rc-eddy commented Jun 28, 2023

@abidlabs is there an option to auto-play chatbot responses?

@dawoodkhan82
Copy link
Collaborator

@rc-eddy currently, there isn't. Although I'll look into adding that when I refactor the chatbot.

@lehic
Copy link

lehic commented Feb 1, 2024

@abidlabs @dawoodkhan82 Have you had a chance to integrate the auto-play feature for the audio in the chatbot responses? If not, is there a workaround that you can recommend?

@dawoodkhan82
Copy link
Collaborator

@lehic So auto-playing audio is not supported in most browsers. There are some workarounds you can try in this issue: #1349. But playing audio without user interaction is generally discouraged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants