## Install Boto3 to Access AWS APIs

Boto3 is the official Python package from AWS that enables developers to programmatically interact with and manage different AWS services, like Polly. 

To know more about Boto3, please visit this link: https://boto3.amazonaws.com/v1/documentation/api/latest/index.html

In [None]:
#!pip3 install boto3

To know more about how to create and manage access keys, visit this link:
https://docs.aws.amazon.com/IAM/latest/UserGuide/id_root-user_manage_add-key.html

To know more about how to configure environment variables, visit this link: https://docs.aws.amazon.com/cli/v1/userguide/cli-configure-envvars.html

In [None]:
import os
os.environ['AWS_ACCESS_KEY_ID'] = 'YOUR_AWS_ACCESS_KEY'
os.environ['AWS_SECRET_ACCESS_KEY'] = 'AWS_SECRET_ACCESS_KEY'
os.environ['AWS_DEFAULT_REGION'] = 'YOUR_REGION'

## Synthesize Speech Using Amazon Polly Python SDK

We can use the Synthesize Speech API to convert a text into speech. To know more about the requst and response format, visit this documentation link:
https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/polly/client/synthesize_speech.html#

In [None]:
# response = client.synthesize_speech(
#     Engine='standard'|'neural'|'long-form'|'generative',
#     LanguageCode='arb'|'cmn-CN'|'cy-GB'|'da-DK'|'de-DE'|'en-AU'|'en-GB'|'en-GB-WLS'|'en-IN'|'en-US'|'es-ES'|'es-MX'|'es-US'|'fr-CA'|'fr-FR'|'is-IS'|'it-IT'|'ja-JP'|'hi-IN'|'ko-KR'|'nb-NO'|'nl-NL'|'pl-PL'|'pt-BR'|'pt-PT'|'ro-RO'|'ru-RU'|'sv-SE'|'tr-TR'|'en-NZ'|'en-ZA'|'ca-ES'|'de-AT'|'yue-CN'|'ar-AE'|'fi-FI'|'en-IE'|'nl-BE'|'fr-BE'|'cs-CZ'|'de-CH'|'en-SG',
#     VoiceId='Aditi'|'Amy'|'Astrid'|'Bianca'|'Brian'|'Camila'|'Carla'|'Carmen'|'Celine'|'Chantal'|'Conchita'|'Cristiano'|'Dora'|'Emma'|'Enrique'|'Ewa'|'Filiz'|'Gabrielle'|'Geraint'|'Giorgio'|'Gwyneth'|'Hans'|'Ines'|'Ivy'|'Jacek'|'Jan'|'Joanna'|'Joey'|'Justin'|'Karl'|'Kendra'|'Kevin'|'Kimberly'|'Lea'|'Liv'|'Lotte'|'Lucia'|'Lupe'|'Mads'|'Maja'|'Marlene'|'Mathieu'|'Matthew'|'Maxim'|'Mia'|'Miguel'|'Mizuki'|'Naja'|'Nicole'|'Olivia'|'Penelope'|'Raveena'|'Ricardo'|'Ruben'|'Russell'|'Salli'|'Seoyeon'|'Takumi'|'Tatyana'|'Vicki'|'Vitoria'|'Zeina'|'Zhiyu'|'Aria'|'Ayanda'|'Arlet'|'Hannah'|'Arthur'|'Daniel'|'Liam'|'Pedro'|'Kajal'|'Hiujin'|'Laura'|'Elin'|'Ida'|'Suvi'|'Ola'|'Hala'|'Andres'|'Sergio'|'Remi'|'Adriano'|'Thiago'|'Ruth'|'Stephen'|'Kazuha'|'Tomoko'|'Niamh'|'Sofie'|'Lisa'|'Isabelle'|'Zayd'|'Danielle'|'Gregory'|'Burcu'|'Jitka'|'Sabrina'|'Jasmine'|'Jihye',
#     LexiconNames=[
#         'string',
#     ],
#     OutputFormat='json'|'mp3'|'ogg_opus'|'ogg_vorbis'|'pcm',
#     SpeechMarkTypes=[
#         'sentence'|'ssml'|'viseme'|'word',
#     ],
#     TextType='ssml'|'text',
#     Text='string'
# )

In [2]:
import boto3
polly_client = boto3.client('polly')

In [3]:
input_text = """
    Amazon Polly is a cloud service that converts text into lifelike speech. 
    You can use Amazon Polly to develop applications that increase engagement 
    and accessibility.
"""

In [4]:
response = polly_client.synthesize_speech(
    Engine='generative',
    LanguageCode='en-US',
    VoiceId='Joanna',
    OutputFormat='mp3',
    TextType='text',
    Text=input_text
)

In [5]:
text_with_ssml = """
<speak>
    Amazon Polly is a cloud service that converts text into lifelike speech. <break time="2s"/> 
    It is developed using advanced <sub alias="natural language processing"> NLP </sub> techniques.
    
    <amazon:domain name="news">
    You can use Amazon Polly to develop applications that increase engagement and accessibility.
    Amazon Polly supports multiple languages and includes a variety of lifelike voices.
    </amazon:domain>
</speak>
"""

ssml_response = polly_client.synthesize_speech(
    Engine='neural',
    LanguageCode='en-US',
    VoiceId='Joanna',
    OutputFormat='mp3',
    TextType='ssml',
    Text=text_with_ssml,
)

## Analyse the Output of the Speech Synthesis Task

In [6]:
response

{'ResponseMetadata': {'RequestId': '2084bff5-e830-4dd6-a1c4-80e0d9dce64d',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': '2084bff5-e830-4dd6-a1c4-80e0d9dce64d',
   'x-amzn-requestcharacters': '165',
   'date': 'Sat, 06 Sep 2025 11:46:16 GMT',
   'content-type': 'audio/mpeg',
   'transfer-encoding': 'chunked',
   'connection': 'keep-alive'},
  'RetryAttempts': 0},
 'ContentType': 'audio/mpeg',
 'RequestCharacters': 165,
 'AudioStream': <botocore.response.StreamingBody at 0x105c891b0>}

In [7]:
nchars = response['RequestCharacters']

In [8]:
print("No of characters converted: ", nchars)

No of characters converted:  165


In [9]:
http_status_code = response['ResponseMetadata']['HTTPStatusCode']
http_status_code

200

In [10]:
with open('speech_input_text.mp3', 'wb') as file:
    file.write(response['AudioStream'].read())

In [11]:
ssml_response

{'ResponseMetadata': {'RequestId': '6a9b410f-b332-43d9-ad08-b656345c5247',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': '6a9b410f-b332-43d9-ad08-b656345c5247',
   'x-amzn-requestcharacters': '320',
   'date': 'Sat, 06 Sep 2025 11:49:39 GMT',
   'content-type': 'audio/mpeg',
   'transfer-encoding': 'chunked',
   'connection': 'keep-alive'},
  'RetryAttempts': 0},
 'ContentType': 'audio/mpeg',
 'RequestCharacters': 320,
 'AudioStream': <botocore.response.StreamingBody at 0x1066cd330>}

In [12]:
with open('speech_text_with_ssml.mp3', 'wb') as file:
    file.write(response['AudioStream'].read())

## Stream Audio Output of the Speech Synthesis Task

In [None]:
#!pip3 install pygame

In [15]:
from pygame import mixer

mixer.init()
mixer.music.load('speech_input_text.mp3')
mixer.music.play()

## Error handling with retry logic

We can configure different settings regarding how the API will attempt to recover from failures.

Details about different retry settings can be found in this documentation link:
https://docs.aws.amazon.com/sdkref/latest/guide/feature-retry-behavior.html

In [16]:
polly_client = boto3.client(
    'polly',
    config=boto3.session.Config(
        retries={
            'max_attempts': 3,
            'mode': 'standard'
        }
    )
)

In [17]:
input_text = """
    Amazon Polly is a cloud service that converts text into lifelike speech. 
    You can use Amazon Polly to develop applications that increase engagement and accessibility.
"""

try:
    response = polly_client.synthesize_speech(
        Engine='generative',
        LanguageCode='en-US',
        OutputFormat='mp3',
        Text=input_text,
        TextType='text',
        VoiceId='Joanna'
    )
except Exception as e:
    print(f"Error synthesizing speech: {e}")

In [18]:
response

{'ResponseMetadata': {'RequestId': '39f8ec56-f057-4d88-a370-0ad7218a0fad',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': '39f8ec56-f057-4d88-a370-0ad7218a0fad',
   'x-amzn-requestcharacters': '165',
   'date': 'Sat, 06 Sep 2025 12:15:56 GMT',
   'content-type': 'audio/mpeg',
   'transfer-encoding': 'chunked',
   'connection': 'keep-alive'},
  'RetryAttempts': 0},
 'ContentType': 'audio/mpeg',
 'RequestCharacters': 165,
 'AudioStream': <botocore.response.StreamingBody at 0x1057dc970>}