How to stream Text to Speech? #487

user080975 · 2023-11-11T15:30:37Z

According to the documentation here for Text to Speech:
https://platform.openai.com/docs/guides/text-to-speech?lang=node

There is the possibility of streaming audio without waiting for the full file to buffer. But the example is a Python one. Is there any possibility of streaming the incoming audio using Node JS?

abhishekgoyal1 · 2023-11-13T23:28:13Z

This needs to be added. I've tried all different methods but I don't think it's supported natively in the node SDK at all at the moment.

This does return a streamable object but there are no chunks found while iterating through it:

const stream = await openai.audio.speech.create( { model: 'tts-1', voice: 'alloy', input: textData, response_format: 'opus', }, { stream: true }, );

rattrayalex · 2023-11-14T00:43:23Z

Yes, this works today – I'm sorry that the example code doesn't reflect that.

You can simply access response.body which is a readable stream (in web, a true ReadableStream and in Node, a Readable), like so:

async function main() {
  const response = await openai.audio.speech.create({
    model: 'tts-1',
    voice: 'alloy',
    input: 'the quick brown fox jumped over the lazy dogs',
  });

  const stream = response.body;
}

I'll try to update the example soon, and won't close this issue until I do. Feel free to share use-cases you'd like to see in the example here, with sample code.

c121914yu · 2023-11-16T06:10:55Z

export async function text2Speech({
  res,
  onSuccess,
  onError,
  model = defaultAudioSpeechModels[0].model,
  voice = Text2SpeechVoiceEnum.alloy,
  input,
  speed = 1
}: {
  res: NextApiResponse;
  onSuccess: (e: { model: string; buffer: Buffer }) => void;
  onError: (e: any) => void;
  model?: string;
  voice?: `${Text2SpeechVoiceEnum}`;
  input: string;
  speed?: number;
}) {
  const ai = getAIApi();
  const response = await ai.audio.speech.create({
    model,
    voice,
    input,
    response_format: 'mp3',
    speed
  });

  const readableStream = response.body as unknown as NodeJS.ReadableStream;
  readableStream.pipe(res);

  let bufferStore = Buffer.from([]);

  readableStream.on('data', (chunk) => {
    bufferStore = Buffer.concat([bufferStore, chunk]);
  });
  readableStream.on('end', () => {
    onSuccess({ model, buffer: bufferStore });
  });
  readableStream.on('error', (e) => {
    onError(e);
  });
}

This is my example, it is a nextjs framework. I hope that will be helpful.

juhana · 2023-11-16T06:24:11Z

Note that the Typescript types aren't correct when reading the response as a stream in Node. You have to do const stream = response.body as unknown as Readable; for it to not throw type errors.

rattrayalex · 2023-11-18T23:26:16Z

To fix those type errors, add import 'openai/shims/node' to the top of your file (details here) if you're on Node, or import 'openai/shims/web' if you're on anything else.

We're working to improve this.

PetersonFonseca · 2023-11-29T16:22:26Z

Hello everyone, can you please help me implement it on node? I can't make it work...

`import path from "path";
import OpenAI from "openai";

const openai = new OpenAI({
apiKey: process.env.OPENAI_SECRET_KEY,
});

const response = openai.audio.speech.create({
model: "tts-1",
voice: "onyx",
input: "Teste de texto para fala.",
});

response.stream_to_file(path.resolve("./speech.mp3"));`

rattrayalex · 2023-11-30T02:11:39Z

We don't provide a stream_to_file method; instead, use response.body.pipe(fs.createWriteStream(myPath)). Here's a complete example:

import OpenAI from 'openai';
import fs from 'fs';
import path from 'path';

// gets API Key from environment variable OPENAI_API_KEY
const openai = new OpenAI();

const speechFile = path.resolve(__dirname, './speech.mp3');

async function streamToFile(stream: NodeJS.ReadableStream, path: fs.PathLike) {
  return new Promise((resolve, reject) => {
    const writeStream = fs.createWriteStream(path).on('error', reject).on('finish', resolve);

    stream.pipe(writeStream).on('error', (error) => {
      writeStream.close();
      reject(error);
    });
  });
}

async function main() {
  const mp3 = await openai.audio.speech.create({
    model: 'tts-1',
    voice: 'alloy',
    input: 'the quick brown chicken jumped over the lazy dogs',
  });

  await streamToFile(mp3.body, speechFile);
}
main();

yaozijun · 2023-12-13T09:21:58Z

async function streamToFile(stream, path) {
    return new Promise((resolve, reject) => {
        const writeStream = fs.createWriteStream(path)
            .on('error', reject)
            .on('finish', resolve);

        stream.pipe(writeStream)
            .on('error', (error) => {
                writeStream.close();
                reject(error);
            });
    });
}
const ret= await openai.audio.speech.create({
                model: "tts-1",
                voice: "onyx",
                input: "test",
            });
const stream = ret.body;
const speechFile = path.resolve(`/xxx/test.mp3`);
await streamToFile(stream, speechFile);

karar-shah · 2024-01-05T17:16:56Z

text2Speech

@c121914yu
Could you kindly assist me in playing the audio stream on the client side?
Thank you.

karar-shah · 2024-01-06T11:07:21Z

We don't provide a stream_to_file method; instead, use response.body.pipe(fs.createWriteStream(myPath)). Here's a complete example:

import OpenAI from 'openai';
import fs from 'fs';
import path from 'path';

// gets API Key from environment variable OPENAI_API_KEY
const openai = new OpenAI();

const speechFile = path.resolve(__dirname, './speech.mp3');

async function streamToFile(stream: NodeJS.ReadableStream, path: fs.PathLike) {
  return new Promise((resolve, reject) => {
    const writeStream = fs.createWriteStream(path).on('error', reject).on('finish', resolve);

    stream.pipe(writeStream).on('error', (error) => {
      writeStream.close();
      reject(error);
    });
  });
}

async function main() {
  const mp3 = await openai.audio.speech.create({
    model: 'tts-1',
    voice: 'alloy',
    input: 'the quick brown chicken jumped over the lazy dogs',
  });

  await streamToFile(mp3.body, speechFile);
}
main();

@rattrayalex , I would like to inquire about the process of streaming the audio response on the Client component in Next.js. Despite searching for the past 1 day, I have been unable to find a solution.
Thank you so much for your help.

c121914yu · 2024-01-06T11:30:24Z

text2Speech
@c121914yu
Could you kindly assist me in playing the audio stream on the client side?
Thank you.

https://github.com/labring/FastGPT/blob/main/projects/app/src/web/common/utils/voice.ts

I haven't brought a computer with me recently, so I can't copy the code easily.

You can refer to my code for client streaming through fetch and MediaSource Api.

However, I have found that this api has some compatibility issues in apple products.

athrael-soju · 2024-01-06T19:36:16Z

text2Speech
@c121914yu
Could you kindly assist me in playing the audio stream on the client side?
Thank you.
https://github.com/labring/FastGPT/blob/main/projects/app/src/web/common/utils/voice.ts

I haven't brought a computer with me recently, so I can't copy the code easily.

You can refer to my code for client streaming through fetch and MediaSource Api.

However, I have found that this api has some compatibility issues in apple products.

You're gonna need polyfill for that

aleksa-codes · 2024-05-24T13:34:05Z

export async function text2Speech({
  res,
  onSuccess,
  onError,
  model = defaultAudioSpeechModels[0].model,
  voice = Text2SpeechVoiceEnum.alloy,
  input,
  speed = 1
}: {
  res: NextApiResponse;
  onSuccess: (e: { model: string; buffer: Buffer }) => void;
  onError: (e: any) => void;
  model?: string;
  voice?: `${Text2SpeechVoiceEnum}`;
  input: string;
  speed?: number;
}) {
  const ai = getAIApi();
  const response = await ai.audio.speech.create({
    model,
    voice,
    input,
    response_format: 'mp3',
    speed
  });

  const readableStream = response.body as unknown as NodeJS.ReadableStream;
  readableStream.pipe(res);

  let bufferStore = Buffer.from([]);

  readableStream.on('data', (chunk) => {
    bufferStore = Buffer.concat([bufferStore, chunk]);
  });
  readableStream.on('end', () => {
    onSuccess({ model, buffer: bufferStore });
  });
  readableStream.on('error', (e) => {
    onError(e);
  });
}

This is my example, it is a nextjs framework. I hope that will be helpful.

Can someone please help me. How do I use this or something similar to have an API route handler (endpoint) and call it from the frontend component in Next.js? I am basically trying to rebuild TTS fucntionallity that is in ChatGPT.

LeakedDave · 2024-06-10T11:17:20Z

Hey Aleksa, stumbled upon this because I'm building it myself. If you still need help... I re-wrote the above example as a simple API Route (pages router /api/voice.js)

import OpenAI from "openai";
const openai = new OpenAI({
    apiKey: process.env.OPENAI_API_KEY,
})

export default async function handler(req, res) {
    const { input } = req.query
    res.setHeader('Content-Type', 'audio/mpeg')

    const response = await openai.audio.speech.create({
        model: "tts-1",
        voice: "alloy",
        input: input,
        response_format: 'mp3',
        speed: 1
    })

    const readableStream = response.body
    readableStream.pipe(res)

    let bufferStore = Buffer.from([])

    readableStream.on('data', (chunk) => {
        bufferStore = Buffer.concat([bufferStore, chunk])
    })

    readableStream.on('end', () => {
        // Store the mp3 somewhere if you want to reuse it
        // onSuccess({ model, buffer: bufferStore });
    })

    readableStream.on('error', (e) => {
        console.error(e)
    })
}

To play it locally from your client side, simple:

const input = "Today is a wonderful day to build something people love!"
new Audio(`/api/voice?input=${input}`).play()

kifjj · 2024-10-04T16:38:47Z

Hi LeakedDave and Alexa,

I implemented the simple route api example nodejs/express and hosted in several environments, google firebase, google app engine and while the streaming works, I observed a strange thing, basically the audio starts playing after 6s to 8s.

I tried many things on the servers (increase memory, move to a closer region) but no luck.

Any idea?

LeakedDave · 2024-10-04T19:44:09Z

Hi LeakedDave and Alexa,

I implemented the simple route api example nodejs/express and hosted in several environments, google firebase, google app engine and while the streaming works, I observed a strange thing, basically the audio starts playing after 6s to 8s.

I tried many things on the servers (increase memory, move to a closer region) but no luck.

Any idea?

Honestly I’m not sure. If possible I would suggest to just host a NextJS API for this, I haven’t tested it with vanilla express at all. It sounds like your API doesn’t support streaming since 6-7 second wait would be the full audio I think.

kifjj · 2024-10-05T16:22:40Z

@LeakedDave oh, you are right Firebase functions and google app engine don't support streaming. Thanks for putting me on the right path.
Looking around I see AWS Lambda introduced streaming support 1 year ago but with some limitation (API Gateway, ALB not supported).

I tried and it works, stream starts in 3s or when cold start 5s. Much better user experience.

Here is the code if someone needs

I started from the aws lambda streaming example and tweek with my code just to make it work
be sure to select nodejs
be sure to increase the timeout, default is just 3s
for function url: 1) use AUTH TYPE: NONE if you want public access and 2) IMPORTANT select Invoke mode
RESPONSE_STREAM otherwise won't stream :-)

`/* global fetch */
import util from 'util';
import stream from 'stream';
const { Readable } = stream;
const pipeline = util.promisify(stream.pipeline);

/* global awslambda */
export const handler = awslambda.streamifyResponse(async (event, responseStream, _context) => {

console.log("Query params" + event["queryStringParameters"]["text"]);

//console.log("event json: " + JSON.stringify(event));

const textToTTS = event["queryStringParameters"]["text"];

if (!textToTTS) {
console.log("no text to translate sent [" + textToTTS + "]");
return;
}

const rs = await fetch('https://api.openai.com/v1/audio/speech', {
method: 'POST',
headers: {
Authorization: 'Bearer ' + okey,
'Content-Type': 'application/json',
},
body: JSON.stringify({
input: textToTTS,
model: 'tts-1',
response_format: 'mp3',
voice: 'echo',
}),
});

await pipeline(rs.body, responseStream);
});
`

rattrayalex self-assigned this Nov 14, 2023

rattrayalex closed this as completed Dec 23, 2023

bqm mentioned this issue Jan 14, 2024

Support text-to-speech (TTS) streaming functionality 64bit/async-openai#177

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to stream Text to Speech? #487

How to stream Text to Speech? #487

user080975 commented Nov 11, 2023 •

edited

Loading

abhishekgoyal1 commented Nov 13, 2023 •

edited

Loading

rattrayalex commented Nov 14, 2023

c121914yu commented Nov 16, 2023

juhana commented Nov 16, 2023

rattrayalex commented Nov 18, 2023

PetersonFonseca commented Nov 29, 2023

rattrayalex commented Nov 30, 2023

yaozijun commented Dec 13, 2023

karar-shah commented Jan 5, 2024

karar-shah commented Jan 6, 2024

c121914yu commented Jan 6, 2024

athrael-soju commented Jan 6, 2024

aleksa-codes commented May 24, 2024

LeakedDave commented Jun 10, 2024 •

edited

Loading

kifjj commented Oct 4, 2024

LeakedDave commented Oct 4, 2024

kifjj commented Oct 5, 2024

How to stream Text to Speech? #487

How to stream Text to Speech? #487

Comments

user080975 commented Nov 11, 2023 • edited Loading

abhishekgoyal1 commented Nov 13, 2023 • edited Loading

rattrayalex commented Nov 14, 2023

c121914yu commented Nov 16, 2023

juhana commented Nov 16, 2023

rattrayalex commented Nov 18, 2023

PetersonFonseca commented Nov 29, 2023

rattrayalex commented Nov 30, 2023

yaozijun commented Dec 13, 2023

karar-shah commented Jan 5, 2024

karar-shah commented Jan 6, 2024

c121914yu commented Jan 6, 2024

athrael-soju commented Jan 6, 2024

aleksa-codes commented May 24, 2024

LeakedDave commented Jun 10, 2024 • edited Loading

kifjj commented Oct 4, 2024

LeakedDave commented Oct 4, 2024

kifjj commented Oct 5, 2024

user080975 commented Nov 11, 2023 •

edited

Loading

abhishekgoyal1 commented Nov 13, 2023 •

edited

Loading

LeakedDave commented Jun 10, 2024 •

edited

Loading