# Text To Speech
## Introduction
The Audio API provides a speech endpoint based on our TTS (text-to-speech) model. It comes with 6 built-in voices and can be used to:
* Narrate a written blog post
* Produce spoken audio in multiple languages
* Give real time audio output using streaming

In [1]:
from openai import OpenAI

client = OpenAI()

speech_file_path = "output/speech.mp3"

response = client.audio.speech.create(
  model="tts-1",
  voice="alloy",
  input="Today is a wonderful day to build something people love!"
)

response.write_to_file(speech_file_path)

In [2]:
from pydub import AudioSegment
from pydub.playback import play

audio = AudioSegment.from_file(speech_file_path, format="mp3")
play(audio)

Input #0, wav, from '/var/folders/_2/4yj9mbbn2_zg36jb021hl_gh0000gn/T/tmp7sej95o1.wav':
  Duration: 00:00:03.53, bitrate: 384 kb/s
  Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 24000 Hz, 1 channels, s16, 384 kb/s
   3.44 M-A:  0.000 fd=   0 aq=    0KB vq=    0KB sq=    0B f=0/0   




   3.48 M-A: -0.000 fd=   0 aq=    0KB vq=    0KB sq=    0B f=0/0   

In [3]:
# Try in Turkish

from openai import OpenAI

client = OpenAI()

speech_file_path = "output/speech.mp3"

response = client.audio.speech.create(
  model="tts-1",
  voice="alloy",
  input="Bugün kod yazmak için çok güzel bir gün!",
  response_format="mp3",
  speed=1.0
)

response.write_to_file(speech_file_path)

In [4]:
from pydub import AudioSegment
from pydub.playback import play

audio = AudioSegment.from_file(speech_file_path, format="mp3")
play(audio)

Input #0, wav, from '/var/folders/_2/4yj9mbbn2_zg36jb021hl_gh0000gn/T/tmpnum7_162.wav':
  Duration: 00:00:02.69, bitrate: 384 kb/s
  Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 24000 Hz, 1 channels, s16, 384 kb/s
   2.48 M-A:  0.000 fd=   0 aq=    0KB vq=    0KB sq=    0B f=0/0   




   2.61 M-A:  0.000 fd=   0 aq=    0KB vq=    0KB sq=    0B f=0/0   

## Audio quality
For real-time applications, the standard `tts-1` model provides the lowest latency but at a lower quality than the `tts-1-hd` model. Due to the way the audio is generated, `tts-1` is likely to generate content that has more static in certain situations than `tts-1-hd`. In some cases, the audio may not have noticeable differences depending on your listening device and the individual person.

## Supported output formats
The default response format is "**mp3**", but other formats like "opus", "aac", or "flac" are available.
* **Opus**: For internet streaming and communication, low latency.
* **AAC**: For digital audio compression, preferred by YouTube, Android, iOS.
* **FLAC**: For lossless audio compression, favored by audio enthusiasts for archiving.

## Supported languages
The TTS model generally follows the Whisper model in terms of language support. Whisper supports the following languages and performs well despite the current voices being optimized for English:

Afrikaans, Arabic, Armenian, Azerbaijani, Belarusian, Bosnian, Bulgarian, Catalan, Chinese, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, Galician, German, Greek, Hebrew, Hindi, Hungarian, Icelandic, Indonesian, Italian, Japanese, Kannada, Kazakh, Korean, Latvian, Lithuanian, Macedonian, Malay, Marathi, Maori, Nepali, Norwegian, Persian, Polish, Portuguese, Romanian, Russian, Serbian, Slovak, Slovenian, Spanish, Swahili, Swedish, Tagalog, Tamil, Thai, Turkish, Ukrainian, Urdu, Vietnamese, and Welsh.

You can generate spoken audio in these languages by providing the input text in the language of your choice.

## Streaming real time audio
The Speech API provides support for real time audio streaming using [chunk transfer encoding](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Transfer-Encoding). This means that the audio is able to be played before the full file has been generated and made accessible.

In [15]:
from openai import OpenAI

client = OpenAI()

# Create text-to-speech audio file
with client.audio.speech.with_streaming_response.create(
    model="tts-1",
    voice="alloy",
    input="Hello world! This is a streaming test."
) as response:
    response.stream_to_file(speech_file_path)

In [29]:
from pydub import AudioSegment
from pydub.playback import play

audio = AudioSegment.from_file(speech_file_path, format="mp3")
play(audio)

Input #0, wav, from '/var/folders/_2/4yj9mbbn2_zg36jb021hl_gh0000gn/T/tmp_oka5w7h.wav':
  Duration: 00:00:02.33, bitrate: 384 kb/s
  Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 24000 Hz, 1 channels, s16, 384 kb/s
   2.17 M-A:  0.000 fd=   0 aq=    0KB vq=    0KB sq=    0B f=0/0   




   2.27 M-A:  0.000 fd=   0 aq=    0KB vq=    0KB sq=    0B f=0/0   