Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Live audio streaming output #5077

Merged
merged 17 commits into from Aug 8, 2023
Merged

Live audio streaming output #5077

merged 17 commits into from Aug 8, 2023

Conversation

aliabid94
Copy link
Collaborator

@aliabid94 aliabid94 commented Aug 3, 2023

This PR allows users to stream audio out. See demo/streaming_audio_out for an example that streams out pieces of an audio file second by second.

Fixes: #5110

@vercel
Copy link

vercel bot commented Aug 3, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Updated (UTC)
gradio ✅ Ready (Inspect) Visit Preview Aug 8, 2023 9:56pm

@gradio-pr-bot
Copy link
Contributor

gradio-pr-bot commented Aug 3, 2023

🦄 change detected

This Pull Request includes changes to the following packages.

Package Version
@gradio/upload minor
gradio minor
  • Maintainers can select this checkbox to manually select packages to update.

With the following changelog entry.

Live audio streaming output

Maintainers or the PR author can modify the PR title to modify this entry.

Something isn't right?

  • Maintainers can change the version label to modify the version bump.
  • If the bot has failed to detect any changes, or if this pull request needs to update multiple packages to different versions or requires a more comprehensive changelog entry, maintainers can update the changelog file directly.

@gradio-pr-bot
Copy link
Contributor

gradio-pr-bot commented Aug 3, 2023

🎉 The demo notebooks match the run.py files! 🎉

@gradio-pr-bot
Copy link
Contributor

gradio-pr-bot commented Aug 3, 2023

All the demos for this PR have been deployed at https://huggingface.co/spaces/gradio-pr-deploys/pr-5077-all-demos


You can install the changes in this PR by running:

pip install https://gradio-builds.s3.amazonaws.com/ab0b17df50c938c4fa6ff0608805406025087774/gradio-3.39.0-py3-none-any.whl

@abidlabs
Copy link
Member

abidlabs commented Aug 3, 2023

Tested this out and it works great @aliabid94 even with multiple outputs! However, I'm concerned about the fact that it uses a completely different mechanism for streaming, as compared to regular generator function including a separate /stream route.

Is this actually necessary? I don't have a concrete alternative right now, but among other things, this breaks the client for any route with streaming. You can try by running:

import gradio as gr
import numpy as np
from pydub import AudioSegment
import time

def stream_audio(lag):
    audio_file = 'test.mp3'  # Your audio file path
    audio = AudioSegment.from_mp3(audio_file)
    chunk_length = 1000
    chunks = []
    while len(audio) > chunk_length:
        chunks.append(audio[:chunk_length])
        audio = audio[chunk_length:]
    if len(audio):  # Ensure we don't end up with an empty chunk
        chunks.append(audio)

    def iter_chunks():  
        for chunk in chunks:
            file_like_object = chunk.export(format="mp3")
            data = file_like_object.read()
            time.sleep(lag)
            yield data

    return iter_chunks(), "fixed response"

demo = gr.Interface(
    stream_audio,
    gr.Slider(0, 3, 0, label="lag", info="Duration before generating next second of audio. >1s to cause lag."),
    [gr.Audio(autoplay=True), gr.Textbox()]
)

if __name__ == "__main__":
    _, url, _ = demo.launch()

and then:

from gradio_client import Client

client = Client(url)
result = client.predict(
				0,	# int | float (numeric value between 0 and 3) in 'lag' Slider component
				api_name="/predict"
)
print(result)

@abidlabs
Copy link
Member

abidlabs commented Aug 3, 2023

Also the user-facing API with having to return the generator is a little different than how Gradio users are used to generating/streaming. I would have expected something like this as the API (directly returning the generator, plus setting streaming=True in the Audio component):

import gradio as gr
import numpy as np
from pydub import AudioSegment
import time

def stream_audio(lag):
   ...
        for chunk in chunks:
            file_like_object = chunk.export(format="mp3")
            data = file_like_object.read()
            time.sleep(lag)
            yield data, "fixed response"

    

demo = gr.Interface(
    stream_audio,
    gr.Slider(0, 3, 0, label="lag", info="Duration before generating next second of audio. >1s to cause lag."),
    [gr.Audio(autoplay=True, streaming=True), gr.Textbox()]
)

if __name__ == "__main__":
    _, url, _ = demo.launch()

@abidlabs
Copy link
Member

abidlabs commented Aug 3, 2023

Ok so here's an idea that doesn't fix everything above but I think would allow you to use the above developer API.

Steps:

  • Developer writes a regular generator function, something like this:
def stream_audio(lag):
   ...
        for chunk in chunks:
            file_like_object = chunk.export(format="mp3")
            data = file_like_object.read()
            time.sleep(lag)
            yield data, "fixed response"

and sets streaming=True in the Audio() output component like this:

demo = gr.Interface(
    stream_audio,
    gr.Slider(0, 3, 0, label="lag", info="Duration before generating next second of audio. >1s to cause lag."),
    [gr.Audio(autoplay=True, streaming=True), gr.Textbox()]
)
  • Gradio sees if any of the outputs have set streaming=True and if so, doesn't evaluate the generator function, but instead pass it into a FastAPI StreamingResponse
  • In the Client, we can check to see if an endpoint has any outputs that stream, and if so, we make them invalid endpoints so that they don't show up in the view API page

@aliabid94
Copy link
Collaborator Author

Gradio sees if any of the outputs have set streaming=True and if so, doesn't evaluate the generator function, but instead pass it into a FastAPI StreamingResponse

StreamingResponse requires a generator that only yields bytes. We could "wrap" the generator with another generator that tosses out all other outputs. However this will obviously ignore the intended user behaviour of setting the other outputs. There's no way we would be able to get access to the other outputs because we don't have access to the outputs as they are being yielded - only FastAPI does, so we can't send updates or anything with those outputs.

@abidlabs
Copy link
Member

abidlabs commented Aug 3, 2023

Suppose you you wanted to create a demo that streamed music and also generated lyrics in realtime for the streaming music. That would not be possible with this API, correct?

@abidlabs
Copy link
Member

abidlabs commented Aug 3, 2023

I think we need to do something like this:

  • When a user passes in a regular generator function, and one of the output components has streaming=True, then a pending_stream is created for that component. Think of the pending_stream as just a regular list.
  • On every iteration of the generator function, we take the output corresponding corresponding to that component and append it to pending_stream. Once the generator function is complete or errors out, we append a special StopIteration token to pending_stream

In the /stream route, we define a generator that looks like this:

def stream_until_complete():
   chunks = pending_stream
   chunk = None
   index = 0
   while not chunk == StopIteration:
      yield chunk
      if index >= len(chunks):
         yield None
      else:
          chunk = chunks[index]
          index += 1

(code may need to be tweaked but this is the general idea)

Then you pass in stream_until_complete into FastAPI's StreamingResponse

The basic idea is that instead of directly passing in our generator function to StreamingResponse (which would mean we lose the other outputs as you said), here we use our generator function to populate a list (potentially even multiple lists if there are multiple streaming output components), and have a second generator that reads from that list which is passed into StreamingResponse.

The benefits of this approach I believe would be to (1) allow developers to maintain an API they are familiar with (2) allow for use cases where you have multiple outputs streaming together

@aliabid94
Copy link
Collaborator Author

Ok now I accept direct yielding from the function, see demo/stream_audio_out/ for an example. Ready for re-review @abidlabs

$code_stream_frames

Streaming can also be done in an output component. A `gr.Audio(streaming=True)` output component can take a stream of audio data yielded piece-wise by a generator function and combines them into a single audio file.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's put the stream_audio_out example demo here (ideally after simplifying it a bit)

@abidlabs
Copy link
Member

abidlabs commented Aug 8, 2023

Here's a simplified demo you can use @aliabid94:

import gradio as gr
from pydub import AudioSegment
import time


def stream_audio(audio_file, lag):
    audio = AudioSegment.from_mp3(audio_file)
    i = 0
    chunk_size = 1000
    
    while chunk_size*i < len(audio):
        chunk = audio[chunk_size*i:chunk_size*(i+1)]
        i += 1
        if chunk:
            file = f"/tmp/{i}.mp3"
            chunk.export(file, format="mp3")            
            yield file, i
        
demo = gr.Interface(
    fn=stream_audio,
    inputs=[
        gr.Audio(type="filepath", label="Audio file to stream"),
        gr.Slider(0, 3, 0,
            label="lag",
            info="Duration before generating next second of audio. Set >1s to cause lag.",
        ),
    ],
    outputs=[
        gr.Audio(
            autoplay=True, 
            streaming=True), # needed to stream output audio
        gr.Textbox()
    ],
)

if __name__ == "__main__":
    demo.queue().launch()

@abidlabs
Copy link
Member

abidlabs commented Aug 8, 2023

Noticing some small issues:

  1. When trying to stream two output audio files at the same time, this doesn't work (raises a mysterious KeyError):

Here's an adaption of the code above:

import gradio as gr
from pydub import AudioSegment
import time


def stream_audio(audio_file, lag):
    audio = AudioSegment.from_mp3(audio_file)
    i = 0
    chunk_size = 1000
    
    while chunk_size*i < len(audio):
        chunk = audio[chunk_size*i:chunk_size*(i+1)]
        i += 1
        if chunk:
            file = f"/tmp/{i}.mp3"
            chunk.export(file, format="mp3")            
            yield file, file
        
demo = gr.Interface(
    fn=stream_audio,
    inputs=[
        gr.Audio(type="filepath", label="Audio file to stream"),
        gr.Slider(0, 3, 0,
            label="lag",
            info="Duration before generating next second of audio. Set >1s to cause lag.",
        ),
    ],
    outputs=[
        gr.Audio(
            autoplay=True, 
            streaming=True), # needed to stream output audio
        gr.Audio(
            autoplay=True, 
            streaming=True), # needed to stream output audio
    ],
)

if __name__ == "__main__":
    demo.queue().launch()
  1. There's a very brief but slightly jarring discontinuity in between the chunks when livestreaming the output audio. Not clear to me if its an issue with the chunking logic or with the streaming logic. This the audio file I tried: https://dl.sndup.net/tckv/test.mp3

@aliabid94
Copy link
Collaborator Author

When trying to stream two output audio files at the same time, this doesn't work (raises a mysterious KeyError):

Fixed.

There's a very brief but slightly jarring discontinuity in between the chunks when livestreaming the output audio

I think it's because we were streaming 1 second chunks, which was too frequent. Increased to 3 second chunks in the demo and the breaks are much better imo.

@abidlabs
Copy link
Member

abidlabs commented Aug 8, 2023

I think it's because we were streaming 1 second chunks, which was too frequent. Increased to 3 second chunks in the demo and the breaks are much better imo.

I think I still hear them in the 3 second, but its very minor so not a blocker imo.

Copy link
Member

@abidlabs abidlabs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome PR @aliabid94!

@aliabid94 aliabid94 merged commit 667875b into main Aug 8, 2023
13 checks passed
@aliabid94 aliabid94 deleted the stream_audio branch August 8, 2023 22:08
@gradio-pr-bot
Copy link
Contributor

gradio-pr-bot commented Aug 8, 2023

🎉 Chromatic build completed!

There are 0 visual changes to review.
There are 0 failed tests to fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Audio streaming is not working properly
3 participants