# OpenAI Text2Speech

This notebook shows how to interact with the `OpenAI Speech API` to achieve text-to-speech capabilities.

First, you need to install required dependencies. In order to make requests to an OpenAI Speech API compatible server one option is to follow the instructions at [Kokoro-FastAPI](https://github.com/remsky/Kokoro-FastAPI) under Quickest Start (docker run) section. This will ensure 'http://localhost:8880/v1' is available for accepting the Text2Speech requests from this example.

In [None]:
!apt update && apt install -y portaudio19-dev build-essential ffmpeg
%pip install --upgrade --quiet openai pyaudio ipython langchain_community langchain-openai torchaudio transformers sounddevice soundfile

## Usage

In [None]:
from langchain_community.tools import OpenAIText2SpeechTool

text_to_speak = "Hello world!"

tts = OpenAIText2SpeechTool(model_id="kokoro", voice="af_sky+af_bella", base_url="http://localhost:8880/v1", api_key="not-needed")

'openai_text2speech'

We can generate audio, stream and play or save it to the temporary file and then play it.

In [None]:
speech_file = tts.run(text_to_speak)
tts.play(speech_file)

Or stream audio directly.

In [None]:
tts.stream_speech(text_to_speak)

## Use within an Agent

In [None]:
from langchain.agents import AgentType, initialize_agent, load_tools
from langchain_openai import OpenAI

In [None]:
llm = OpenAI(temperature=0)
tools = load_tools(["openai_text2speech"])
agent = initialize_agent(
    tools=tools,
    llm=llm,
    agent=AgentType.STRUCTURED_CHAT_ZERO_SHOT_REACT_DESCRIPTION,
    verbose=True,
)

In [None]:
audio_file = agent.run("Tell me a joke and read it out for me.")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mAction:
```
{
  "action": "openai_labs_text2speech",
  "action_input": {
    "query": "Why did the chicken cross the playground? To get to the other slide!"
  }
}
```

[0m
Observation: [36;1m[1;3m/tmp/tmpsfg783f1.wav[0m
Thought:[32;1m[1;3m I have the audio file ready to be sent to the human
Action:
```
{
  "action": "Final Answer",
  "action_input": "/tmp/tmpsfg783f1.wav"
}
```

[0m

[1m> Finished chain.[0m


In [None]:
tts.play(audio_file)