# AWS Polly (Text to Speech)

In [1]:
!pip install boto3



AWS Polly is a text-to-speech (TTS) service that can convert text into lifelike speech. Here's a series of examples that illustrate how to use AWS Polly in different ways using Python and the Boto3 SDK.

### Prerequisites
Make sure to install the Boto3 library before starting:

You also need to configure AWS credentials (access key and secret key) using the AWS CLI or environment variables.

### Example 1: Basic Text-to-Speech Conversion
This example shows a basic implementation where we convert text to speech and save it as an MP3 file.

In [None]:
import boto3

# Replace these with your AWS access credentials
AWS_ACCESS_KEY_ID = "AKIA2TKG3"
AWS_SECRET_ACCESS_KEY = "UeFcShEQjUn"
AWS_REGION_NAME = "us-west-2"

# Create the AWS Polly client once using root keys
polly_client = boto3.client(
    'polly',
    aws_access_key_id=AWS_ACCESS_KEY_ID,
    aws_secret_access_key=AWS_SECRET_ACCESS_KEY,
    region_name=AWS_REGION_NAME
)


In [5]:
# Convert text to speech
response = polly_client.synthesize_speech(
    Text='Hello, AWS Polly! This is a simple text-to-speech example.',
    OutputFormat='mp3',
    VoiceId='Joanna'
)

# Save the audio to an MP3 file
with open("speech.mp3", "wb") as file:
    file.write(response['AudioStream'].read())

print("Speech saved as 'speech.mp3'")

Speech saved as 'speech.mp3'


In [8]:
from IPython.display import Audio, display

# Path to your mp3 file
file_path = "speech.mp3"

# Create an Audio object and play it in the notebook
audio = Audio(file_path, autoplay=False)
display(audio)


### Example 2: Supported Voices and Languages
This example lists the supported voices and languages in AWS Polly.

In [9]:
# Get available voices
response = polly_client.describe_voices()

# Print the available voices
for voice in response['Voices']:
    print(f"Voice Name: {voice['Name']}, Language: {voice['LanguageName']}, Gender: {voice['Gender']}")

Voice Name: Isabelle, Language: Belgian French, Gender: Female
Voice Name: Danielle, Language: US English, Gender: Female
Voice Name: Gregory, Language: US English, Gender: Male
Voice Name: Burcu, Language: Turkish, Gender: Female
Voice Name: Jitka, Language: Czech, Gender: Female
Voice Name: Sabrina, Language: Swiss Standard German, Gender: Female
Voice Name: Kevin, Language: US English, Gender: Male
Voice Name: Filiz, Language: Turkish, Gender: Female
Voice Name: Elin, Language: Swedish, Gender: Female
Voice Name: Astrid, Language: Swedish, Gender: Female
Voice Name: Tatyana, Language: Russian, Gender: Female
Voice Name: Maxim, Language: Russian, Gender: Male
Voice Name: Carmen, Language: Romanian, Gender: Female
Voice Name: Inês, Language: Portuguese, Gender: Female
Voice Name: Cristiano, Language: Portuguese, Gender: Male
Voice Name: Vitória, Language: Brazilian Portuguese, Gender: Female
Voice Name: Ricardo, Language: Brazilian Portuguese, Gender: Male
Voice Name: Camila, Language

### Example 3: Converting Text to Speech with SSML
Speech Synthesis Markup Language (SSML) can be used to customize how Polly reads the text.

In [12]:
# SSML input with customized pauses and emphasis
ssml_text = """
<speak>
    Hello, <break time="500ms"/> welcome to the world of AWS Polly.
    <emphasis level="strong">This is amazing!</emphasis>
</speak>
"""

# Convert text to speech using SSML
response = polly_client.synthesize_speech(
    TextType='ssml',
    Text=ssml_text,
    OutputFormat='mp3',
    VoiceId='Matthew'
)

# Save the audio to an MP3 file
file_path = "ssml_speech.mp3"
with open(file_path, "wb") as file:
    file.write(response['AudioStream'].read())

print("Speech saved as 'ssml_speech.mp3'")

Speech saved as 'ssml_speech.mp3'


In [13]:
# Create an Audio object and play it in the notebook
audio = Audio(file_path, autoplay=False)
display(audio)

### Example 4: Change Voice Attributes
AWS Polly offers a variety of voice options, including male and female voices, with different accents and languages.

In [14]:
# Convert text to speech with a different voice
response = polly_client.synthesize_speech(
    Text='Hi, this is Brian from AWS Polly.',
    OutputFormat='mp3',
    VoiceId='Brian'
)

# Save the audio to an MP3 file
file_path = "brian_speech.mp3"
with open(file_path, "wb") as file:
    file.write(response['AudioStream'].read())

print("Speech saved as 'brian_speech.mp3'")

Speech saved as 'brian_speech.mp3'


In [15]:
audio = Audio(file_path, autoplay=False)
display(audio)

### Example 5: Using AWS Polly in Real-Time Streaming
This example demonstrates a real-time streaming application where Polly reads out the text.

In [21]:
# Convert text to speech for streaming
response = polly_client.synthesize_speech(
    Text="This is a real-time text-to-speech streaming example with AWS Polly.",
    OutputFormat='mp3',
    VoiceId='Amy'
)

# Stream the audio
file_path = "streaming_speech.mp3"
if "AudioStream" in response:
    with open(file_path, "wb") as file:
        stream = response['AudioStream']
        data = stream.read(1024)
        while data:
            file.write(data)
            data = stream.read(1024)

print("Speech saved as 'streaming_speech.mp3'")

Speech saved as 'streaming_speech.mp3'


In [22]:
audio = Audio(file_path, autoplay=False)
display(audio)

### What Does "Real-Time" Mean Here?
In the context of the AWS Polly example:

```python
file_path = "streaming_speech.mp3"
if "AudioStream" in response:
    with open(file_path, "wb") as file:
        stream = response['AudioStream']
        data = stream.read(1024)
        while data:
            file.write(data)
            data = stream.read(1024)
```

The phrase "real-time streaming" here is referring to the way the audio data is **fetched and saved** in chunks, rather than fetching the entire audio file in one go.

**Here's a breakdown:**
- When AWS Polly processes your request, the response contains an audio stream object.
- The `response['AudioStream']` is a streaming object from which you can read the generated speech in small chunks.
- The loop reads and writes this stream in chunks (of size `1024` bytes in the given example) to avoid loading the entire file into memory at once, which might be useful for large files.

This is a basic **streaming download** process, but it is not a "real-time" playback in the sense that you would hear it while it's being generated. Instead, it reads the generated content incrementally and saves it to a file, as opposed to reading the entire content in one shot.

### How Would Real-Time Audio Playback Work?
If you were truly looking for **real-time playback**, you would want to stream the audio directly to a playback device as it's being received, rather than writing it to a file and then playing it. AWS Polly doesn't directly offer a way to do this with Python, but you could implement something more like real-time playback by streaming the audio to a media player.

Here's what the code **does**:
- **Chunked Download**: The code reads the generated audio from the `AudioStream` in chunks of `1024` bytes and writes it directly to a file (`streaming_speech.mp3`).
- **Incremental Saving**: This incremental downloading and saving allows you to handle larger files without running into memory issues.

### Example 6: Save Speech in Multiple File Formats
Polly supports various output formats, such as MP3, OGG, and PCM. Here's how you can save the output in different formats:

In [23]:
# Convert text to speech in OGG format
response = polly_client.synthesize_speech(
    Text='This is a demo for OGG output format.',
    OutputFormat='ogg_vorbis',
    VoiceId='Joanna'
)

In [24]:
# Save the audio in OGG format
with open("speech.ogg", "wb") as file:
    file.write(response['AudioStream'].read())

print("Speech saved as 'speech.ogg'")

Speech saved as 'speech.ogg'


These examples demonstrate different ways of using AWS Polly to convert text into speech, including changing voices, using SSML, streaming audio, working with different output formats, and using custom lexicons.