stream param to Speech and AsyncSpeech #724

antont · 2023-11-07T21:46:33Z

help.openai says, in https://help.openai.com/en/articles/8555505-tts-api#h_f2f424c6cb

Is it possible to stream audio?
Yes! By setting stream=True, you can chunk the returned audio file.

However, the API reference does not have such a param in https://platform.openai.com/docs/api-reference/audio/createSpeech

The guide again says, in https://platform.openai.com/docs/guides/text-to-speech:

The Speech API provides support for real time audio streaming using chunk transfer encoding. This means that the audio is able to be played before the full file has been generated and made accessible.

However, the code there seems like it reads the content in full, before responding over http?

  # Convert the binary response content to a byte stream
  byte_stream = io.BytesIO(response.content)

I tried to experiment, and indeed it seemed that that version did not stream.

I modified the speech part of the client lib to pass the stream: bool onwards, and I think this way it actually works, my playback starts so soon. I did not verify this (yet) with proper debugging.

This should be considered a draft, because apparently the stream=True param for the speech.create changes the return type from HttpxBinaryResponseContent to AsyncStream. I did not fix the type infos etc.

…nse is an AsyncStream NOTE: probably some specific Audio chunk type would be nice, but am leaving that for the lib designers.

rattrayalex · 2023-11-10T02:26:50Z

Thanks for the PR; we're working on this but may do it a different way (stream_cls should not be ChatCompletionChunk of course).

I'll close this for now but do keep your eyes peeled for this feature in the next week or two.

antont · 2023-11-10T05:31:56Z

Right, thanks for info.

Am happy that with this modification it works though, so we can already benefit from the quick start. At least in my testing there is a big difference, with streaming it starts in about 2 secs, for a text with 50 words, while without the parameter it takes maybe 6-8 secs.

Good to know that the proper support is coming!

stream param to Speech and AsyncSpeech

3cec953

antont requested a review from a team as a code owner November 7, 2023 21:46

antont marked this pull request as draft November 7, 2023 21:56

add type hints for the case when speech has stream=True and the respo…

c9ea693

…nse is an AsyncStream NOTE: probably some specific Audio chunk type would be nice, but am leaving that for the lib designers.

antont marked this pull request as ready for review November 9, 2023 06:45

rattrayalex closed this Nov 10, 2023

antont mentioned this pull request Nov 23, 2023

TTS streaming does not work #864

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

stream param to Speech and AsyncSpeech #724

stream param to Speech and AsyncSpeech #724

antont commented Nov 7, 2023 •

edited

Loading

rattrayalex commented Nov 10, 2023

antont commented Nov 10, 2023

stream param to Speech and AsyncSpeech #724

stream param to Speech and AsyncSpeech #724

Conversation

antont commented Nov 7, 2023 • edited Loading

rattrayalex commented Nov 10, 2023

antont commented Nov 10, 2023

antont commented Nov 7, 2023 •

edited

Loading