## Install Boto3 to Access AWS APIs

Boto3 is the official Python package from AWS that enables developers to programmatically interact with and manage different AWS services, like Polly. 

To know more about Boto3, please visit this link: https://boto3.amazonaws.com/v1/documentation/api/latest/index.html

In [1]:
#!pip3 install boto3

In [2]:
import os
os.environ['AWS_ACCESS_KEY_ID'] = 'YOUR_AWS_ACCESS_KEY'
os.environ['AWS_SECRET_ACCESS_KEY'] = 'AWS_SECRET_ACCESS_KEY'
os.environ['AWS_DEFAULT_REGION'] = 'YOUR_REGION'

## Synthesize Speech Using Amazon Polly Python SDK

We can use the Synthesize Speech API to convert a text into speech. To know more about the requst and response format, visit this documentation link:
https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/polly/client/synthesize_speech.html#

In [3]:
# response = client.synthesize_speech(
#     Engine='standard'|'neural'|'long-form'|'generative',
#     LanguageCode='arb'|'cmn-CN'|'cy-GB'|'da-DK'|'de-DE'|'en-AU'|'en-GB'|'en-GB-WLS'|'en-IN'|'en-US'|'es-ES'|'es-MX'|'es-US'|'fr-CA'|'fr-FR'|'is-IS'|'it-IT'|'ja-JP'|'hi-IN'|'ko-KR'|'nb-NO'|'nl-NL'|'pl-PL'|'pt-BR'|'pt-PT'|'ro-RO'|'ru-RU'|'sv-SE'|'tr-TR'|'en-NZ'|'en-ZA'|'ca-ES'|'de-AT'|'yue-CN'|'ar-AE'|'fi-FI'|'en-IE'|'nl-BE'|'fr-BE'|'cs-CZ'|'de-CH'|'en-SG',
#     LexiconNames=[
#         'string',
#     ],
#     OutputFormat='json'|'mp3'|'ogg_opus'|'ogg_vorbis'|'pcm',
#     SampleRate='string',
#     SpeechMarkTypes=[
#         'sentence'|'ssml'|'viseme'|'word',
#     ],
#     Text='string',
#     TextType='ssml'|'text',
#     VoiceId='Aditi'|'Amy'|'Astrid'|'Bianca'|'Brian'|'Camila'|'Carla'|'Carmen'|'Celine'|'Chantal'|'Conchita'|'Cristiano'|'Dora'|'Emma'|'Enrique'|'Ewa'|'Filiz'|'Gabrielle'|'Geraint'|'Giorgio'|'Gwyneth'|'Hans'|'Ines'|'Ivy'|'Jacek'|'Jan'|'Joanna'|'Joey'|'Justin'|'Karl'|'Kendra'|'Kevin'|'Kimberly'|'Lea'|'Liv'|'Lotte'|'Lucia'|'Lupe'|'Mads'|'Maja'|'Marlene'|'Mathieu'|'Matthew'|'Maxim'|'Mia'|'Miguel'|'Mizuki'|'Naja'|'Nicole'|'Olivia'|'Penelope'|'Raveena'|'Ricardo'|'Ruben'|'Russell'|'Salli'|'Seoyeon'|'Takumi'|'Tatyana'|'Vicki'|'Vitoria'|'Zeina'|'Zhiyu'|'Aria'|'Ayanda'|'Arlet'|'Hannah'|'Arthur'|'Daniel'|'Liam'|'Pedro'|'Kajal'|'Hiujin'|'Laura'|'Elin'|'Ida'|'Suvi'|'Ola'|'Hala'|'Andres'|'Sergio'|'Remi'|'Adriano'|'Thiago'|'Ruth'|'Stephen'|'Kazuha'|'Tomoko'|'Niamh'|'Sofie'|'Lisa'|'Isabelle'|'Zayd'|'Danielle'|'Gregory'|'Burcu'|'Jitka'|'Sabrina'|'Jasmine'|'Jihye'
# )

In [4]:
import boto3
polly_client = boto3.client('polly')

In [5]:
input_text = """
    Amazon Polly is a cloud service that converts text into lifelike speech. 
    You can use Amazon Polly to develop applications that increase engagement 
    and accessibility.
"""

response = polly_client.synthesize_speech(
    Engine='generative',
    LanguageCode='en-US',
    OutputFormat='mp3',
    Text=input_text,
    TextType='text',
    VoiceId='Joanna'
)

In [6]:
text_with_ssml = """
<speak>
    Amazon Polly is a cloud service that converts text into lifelike speech. <break time="2s"/> 
    It is developed using advanced <sub alias="natural language processing"> NLP </sub> techniques.
    
    <amazon:domain name="news">
    You can use Amazon Polly to develop applications that increase engagement and accessibility.
    Amazon Polly supports multiple languages and includes a variety of lifelike voices.
    </amazon:domain>
</speak>
"""

ssml_response = polly_client.synthesize_speech(
    Engine='neural',
    LanguageCode='en-US',
    OutputFormat='mp3',
    Text=text_with_ssml,
    TextType='ssml',
    VoiceId='Joanna'
)

## Analyse the Output of the Speech Synthesis Task

In [7]:
response

{'ResponseMetadata': {'RequestId': 'ac636da4-a389-4a6f-a6cb-e9877cc03848',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': 'ac636da4-a389-4a6f-a6cb-e9877cc03848',
   'x-amzn-requestcharacters': '165',
   'date': 'Thu, 04 Sep 2025 13:03:18 GMT',
   'content-type': 'audio/mpeg',
   'transfer-encoding': 'chunked',
   'connection': 'keep-alive'},
  'RetryAttempts': 0},
 'ContentType': 'audio/mpeg',
 'RequestCharacters': 165,
 'AudioStream': <botocore.response.StreamingBody at 0x10616c7c0>}

In [8]:
http_status_code = response['ResponseMetadata']['HTTPStatusCode']
http_status_code

200

In [9]:
with open('speech_input_text.mp3', 'wb') as file:
    file.write(response['AudioStream'].read())

In [10]:
ssml_response

{'ResponseMetadata': {'RequestId': 'd77b6351-69d2-49e0-9003-dc1b55626a35',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': 'd77b6351-69d2-49e0-9003-dc1b55626a35',
   'x-amzn-requestcharacters': '320',
   'date': 'Thu, 04 Sep 2025 13:03:19 GMT',
   'content-type': 'audio/mpeg',
   'transfer-encoding': 'chunked',
   'connection': 'keep-alive'},
  'RetryAttempts': 0},
 'ContentType': 'audio/mpeg',
 'RequestCharacters': 320,
 'AudioStream': <botocore.response.StreamingBody at 0x10616d510>}

In [11]:
with open('speech_text_with_ssml.mp3', 'wb') as file:
    file.write(response['AudioStream'].read())

## Stream Audio Output of the Speech Synthesis Task

In [12]:
#!pip3 install pygame

In [13]:
from pygame import mixer

mixer.init()
mixer.music.load('speech_input_text.mp3')
mixer.music.play()

pygame 2.6.1 (SDL 2.28.4, Python 3.13.7)
Hello from the pygame community. https://www.pygame.org/contribute.html


  from pkg_resources import resource_stream, resource_exists


## Error handling with retry logic

We can configure different settings regarding how the API will attempt to recover from failures.

Details about different retry settings can be found in this documentation link:
https://docs.aws.amazon.com/sdkref/latest/guide/feature-retry-behavior.html

In [14]:
polly_client = boto3.client(
    'polly',
    config=boto3.session.Config(
        retries={
            'max_attempts': 3,
            'mode': 'standard'
        }
    )
)

In [15]:
input_text = """
    Amazon Polly is a cloud service that converts text into lifelike speech. 
    You can use Amazon Polly to develop applications that increase engagement and accessibility.
"""

try:
    response = polly_client.synthesize_speech(
        Engine='generative',
        LanguageCode='en-US',
        OutputFormat='mp3',
        Text=input_text,
        TextType='text',
        VoiceId='Joanna'
    )
except Exception as e:
    print(f"Error synthesizing speech: {e}")

In [16]:
with open('speech_input_text.mp3', 'wb') as file:
    file.write(response['AudioStream'].read())