Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stream param to Speech and AsyncSpeech #724

Closed
wants to merge 2 commits into from

Conversation

antont
Copy link

@antont antont commented Nov 7, 2023

help.openai says, in https://help.openai.com/en/articles/8555505-tts-api#h_f2f424c6cb

Is it possible to stream audio?
Yes! By setting stream=True, you can chunk the returned audio file.

However, the API reference does not have such a param in https://platform.openai.com/docs/api-reference/audio/createSpeech

The guide again says, in https://platform.openai.com/docs/guides/text-to-speech:

The Speech API provides support for real time audio streaming using chunk transfer encoding. This means that the audio is able to be played before the full file has been generated and made accessible.

However, the code there seems like it reads the content in full, before responding over http?

  # Convert the binary response content to a byte stream
  byte_stream = io.BytesIO(response.content)

I tried to experiment, and indeed it seemed that that version did not stream.

I modified the speech part of the client lib to pass the stream: bool onwards, and I think this way it actually works, my playback starts so soon. I did not verify this (yet) with proper debugging.

This should be considered a draft, because apparently the stream=True param for the speech.create changes the return type from HttpxBinaryResponseContent to AsyncStream. I did not fix the type infos etc.

@antont antont requested a review from a team as a code owner November 7, 2023 21:46
@antont antont marked this pull request as draft November 7, 2023 21:56
…nse is an AsyncStream

NOTE: probably some specific Audio chunk type would be nice, but am leaving that for the lib designers.
@antont antont marked this pull request as ready for review November 9, 2023 06:45
@rattrayalex
Copy link
Collaborator

Thanks for the PR; we're working on this but may do it a different way (stream_cls should not be ChatCompletionChunk of course).

I'll close this for now but do keep your eyes peeled for this feature in the next week or two.

@antont
Copy link
Author

antont commented Nov 10, 2023

Right, thanks for info.

Am happy that with this modification it works though, so we can already benefit from the quick start. At least in my testing there is a big difference, with streaming it starts in about 2 secs, for a text with 50 words, while without the parameter it takes maybe 6-8 secs.

Good to know that the proper support is coming!

@antont antont mentioned this pull request Nov 23, 2023
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants