## Amazon Polly

In following examples, I will show how by using boto3, we can interact with Polly service.

We need to start with creating Polly client:

In [21]:
import boto3
import IPython.display as ipd
from pprint import pprint
from contextlib import closing

session = boto3.session.Session()
polly_client = session.client('polly')

### Describe voices

In our first example, we will take a look at method called `describe_voices`, which returns all available voices that we can use to synthesize speech.

In [11]:
response = polly_client.describe_voices()

print(len(response['Voices']))

pprint(response['Voices'])

52
[{'Gender': 'Female',
  'Id': 'Filiz',
  'LanguageCode': 'tr-TR',
  'LanguageName': 'Turkish',
  'Name': 'Filiz'},
 {'Gender': 'Female',
  'Id': 'Astrid',
  'LanguageCode': 'sv-SE',
  'LanguageName': 'Swedish',
  'Name': 'Astrid'},
 {'Gender': 'Female',
  'Id': 'Tatyana',
  'LanguageCode': 'ru-RU',
  'LanguageName': 'Russian',
  'Name': 'Tatyana'},
 {'Gender': 'Male',
  'Id': 'Maxim',
  'LanguageCode': 'ru-RU',
  'LanguageName': 'Russian',
  'Name': 'Maxim'},
 {'Gender': 'Female',
  'Id': 'Carmen',
  'LanguageCode': 'ro-RO',
  'LanguageName': 'Romanian',
  'Name': 'Carmen'},
 {'Gender': 'Female',
  'Id': 'Ines',
  'LanguageCode': 'pt-PT',
  'LanguageName': 'Portuguese',
  'Name': 'Inês'},
 {'Gender': 'Male',
  'Id': 'Cristiano',
  'LanguageCode': 'pt-PT',
  'LanguageName': 'Portuguese',
  'Name': 'Cristiano'},
 {'Gender': 'Female',
  'Id': 'Vitoria',
  'LanguageCode': 'pt-BR',
  'LanguageName': 'Brazilian Portuguese',
  'Name': 'Vitória'},
 {'Gender': 'Male',
  'Id': 'Ricardo',
  'L

As we can see, there's a lot of possible voices to choose from (52 at the time of preparing this). To make our life a bit easier, `describe_voices` method accepts parameter `LanguageCode` for filtering response to only return voices in given language. `LanguageCode` consist of ISO 639 language code and ISO 3166 country code, separated by dash (`-`), for example: `pl-PL` or `en-US`. 

Let's try to get all Polish voices:

In [14]:
response = polly_client.describe_voices(LanguageCode='pl-PL')

print(len(response['Voices']))

pprint(response['Voices'])

4
[{'Gender': 'Female',
  'Id': 'Maja',
  'LanguageCode': 'pl-PL',
  'LanguageName': 'Polish',
  'Name': 'Maja'},
 {'Gender': 'Male',
  'Id': 'Jan',
  'LanguageCode': 'pl-PL',
  'LanguageName': 'Polish',
  'Name': 'Jan'},
 {'Gender': 'Male',
  'Id': 'Jacek',
  'LanguageCode': 'pl-PL',
  'LanguageName': 'Polish',
  'Name': 'Jacek'},
 {'Gender': 'Female',
  'Id': 'Ewa',
  'LanguageCode': 'pl-PL',
  'LanguageName': 'Polish',
  'Name': 'Ewa'}]


### Synthesize speech - creating audio file

In out first example, we will try out `synthesize_speech` method for creating `mp3` audio file from provided text. Polly can output audio in `mp3`, `ogg_vorbis` and `pcm`. We will use `en-US` VoiceId `Matthew`. For text we will use quick summary of Amazon Polly service from official AWS documentation.

In [24]:
text = """
Amazon Polly is a service that turns text into lifelike speech, allowing you to create applications that talk,
and build entirely new categories of speech-enabled products.
Amazon Polly is a Text-to-Speech service that uses advanced deep learning technologies
to synthesize speech that sounds like a human voice.
With dozens of lifelike voices across a variety of languages,
you can select the ideal voice and build speech-enabled applications that work in many different countries.
"""

response = polly_client.synthesize_speech(
    OutputFormat='mp3',
    Text=text,
    VoiceId='Matthew'
)
pprint(response)

{'AudioStream': <botocore.response.StreamingBody object at 0x7f0da9e41e48>,
 'ContentType': 'audio/mpeg',
 'RequestCharacters': '482',
 'ResponseMetadata': {'HTTPHeaders': {'connection': 'keep-alive',
                                      'content-type': 'audio/mpeg',
                                      'date': 'Sat, 24 Mar 2018 17:45:38 GMT',
                                      'transfer-encoding': 'chunked',
                                      'x-amzn-requestcharacters': '482',
                                      'x-amzn-requestid': '2a8b4562-2f8b-11e8-bbef-d91983422c12'},
                      'HTTPStatusCode': 200,
                      'RequestId': '2a8b4562-2f8b-11e8-bbef-d91983422c12',
                      'RetryAttempts': 0}}


In the response, we received `AudioStream`, which we will save to a file.

In [25]:
with closing(response['AudioStream']) as s:
    audio = s.read()
    with open('./en-US-polly-example.mp3', 'wb') as f:
        f.write(audio)

Now let's try to open the file and play it to test how well Polly did on our example text.

In [26]:
ipd.Audio('./en-US-polly-example.mp3')

Sounds pretty good! In our next example, we will try something a bit harder - synthesizing speech in Polish! For that task, we will pick `Maja` VoiceId. As a text for this task, we will use few sentences from one of the blog posts on blog.chmurowisko.pl - [Ansible – Wystartuj z Infrastructure as Code w 30 Sekund. To Dziecinnie Proste!](https://chmurowisko.pl/ansible-wystartuj-infrastructure-as-code-30-sekund-dziecinnie-proste/)

In [28]:
text = """
Słyszałeś o Ansible, ale nie wiesz, jak zacząć?
To banalnie proste. Zapomnij o opasłych książkach, wielogodzinnych tutorialach i latach praktyki.
Wszystko, czego potrzebujesz, aby wystartować z Ansible,
to kilka minut poświęconych na przeczytanie naszego artykułu. Gotowy?
"""

response = polly_client.synthesize_speech(
    OutputFormat='mp3',
    Text=text,
    VoiceId='Maja'
)
pprint(response)

with closing(response['AudioStream']) as s:
    audio = s.read()
    with open('./pl-PL-polly-example.mp3', 'wb') as f:
        f.write(audio)

{'AudioStream': <botocore.response.StreamingBody object at 0x7f0da9e41fd0>,
 'ContentType': 'audio/mpeg',
 'RequestCharacters': '272',
 'ResponseMetadata': {'HTTPHeaders': {'connection': 'keep-alive',
                                      'content-type': 'audio/mpeg',
                                      'date': 'Sat, 24 Mar 2018 18:41:47 GMT',
                                      'transfer-encoding': 'chunked',
                                      'x-amzn-requestcharacters': '272',
                                      'x-amzn-requestid': '01e0608e-2f93-11e8-a667-7738a0edec97'},
                      'HTTPStatusCode': 200,
                      'RequestId': '01e0608e-2f93-11e8-a667-7738a0edec97',
                      'RetryAttempts': 0}}


Now let's try to open the file and play it to test how well Polly did with Polish!

In [29]:
ipd.Audio('./pl-PL-polly-example.mp3')

As we can hear, `Maja` has problem with interpreting English word `Ansible`. Let's try and help her by using lexicons!

### Using lexicons

During synthesizing speech, Polly has the ability to use lexicons defined according to PLS (Pronunciation Lexicon Specification). To manage lexicons, client library offers four methods: `get_lexicon`, `list_lexicons`, `put_lexicon` and `delete_lexicon`. Let's see how they work.

In our example, we will try to add lexicon, that will help `Maja` correctly pronounce word `Ansible`. 

In [40]:
with open('./ansible_lexicon.pls') as f:
    lexicon = f.read()

pprint(lexicon)

('<?xml version="1.0" encoding="UTF-8"?>\n'
 '<lexicon version="1.0" \n'
 '      xmlns="http://www.w3.org/2005/01/pronunciation-lexicon"\n'
 '      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" \n'
 '      xsi:schemaLocation="http://www.w3.org/2005/01/pronunciation-lexicon \n'
 '        '
 'http://www.w3.org/TR/2007/CR-pronunciation-lexicon-20071212/pls.xsd"\n'
 '      alphabet="ipa" xml:lang="pl-PL">\n'
 '  <lexeme>\n'
 '    <grapheme>Ansible</grapheme>\n'
 '    <phoneme>ansibl</phoneme>\n'
 '  </lexeme>\n'
 '</lexicon>\n')


In [41]:
response = polly_client.put_lexicon(
    Name='AnsibleLexicon',
    Content=lexicon
)

pprint(response)

{'ResponseMetadata': {'HTTPHeaders': {'connection': 'keep-alive',
                                      'content-length': '2',
                                      'content-type': 'application/json',
                                      'date': 'Sat, 24 Mar 2018 18:59:17 GMT',
                                      'x-amzn-requestid': '73d03eca-2f95-11e8-bc44-abbcf813b055'},
                      'HTTPStatusCode': 200,
                      'RequestId': '73d03eca-2f95-11e8-bc44-abbcf813b055',
                      'RetryAttempts': 0}}


Let's see if our lexicon is now available.

In [42]:
response = polly_client.get_lexicon(
    Name='AnsibleLexicon'
)

pprint(response)

{'Lexicon': {'Content': '<?xml version="1.0" encoding="UTF-8"?>\n'
                        '<lexicon version="1.0" \n'
                        '      '
                        'xmlns="http://www.w3.org/2005/01/pronunciation-lexicon"\n'
                        '      '
                        'xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" \n'
                        '      '
                        'xsi:schemaLocation="http://www.w3.org/2005/01/pronunciation-lexicon \n'
                        '        '
                        'http://www.w3.org/TR/2007/CR-pronunciation-lexicon-20071212/pls.xsd"\n'
                        '      alphabet="ipa" xml:lang="pl-PL">\n'
                        '  <lexeme>\n'
                        '    <grapheme>Ansible</grapheme>\n'
                        '    <phoneme>ansibl</phoneme>\n'
                        '  </lexeme>\n'
                        '</lexicon>\n',
             'Name': 'AnsibleLexicon'},
 'LexiconAttributes': {'Alphabet': 'ipa',

Now that we have our lexicon ready, let's try once again synthesizing speech in Polish, but now we will also use lexicon for correct pronounciation of `Ansible`.

In [43]:
text = """
Słyszałeś o Ansible, ale nie wiesz, jak zacząć?
To banalnie proste. Zapomnij o opasłych książkach, wielogodzinnych tutorialach i latach praktyki.
Wszystko, czego potrzebujesz, aby wystartować z Ansible,
to kilka minut poświęconych na przeczytanie naszego artykułu. Gotowy?
"""

response = polly_client.synthesize_speech(
    OutputFormat='mp3',
    Text=text,
    VoiceId='Maja',
    LexiconNames=['AnsibleLexicon', ]
)
pprint(response)

with closing(response['AudioStream']) as s:
    audio = s.read()
    with open('./pl-PL-with-lexicon-polly-example.mp3', 'wb') as f:
        f.write(audio)

{'AudioStream': <botocore.response.StreamingBody object at 0x7f0daa26c128>,
 'ContentType': 'audio/mpeg',
 'RequestCharacters': '272',
 'ResponseMetadata': {'HTTPHeaders': {'connection': 'keep-alive',
                                      'content-type': 'audio/mpeg',
                                      'date': 'Sat, 24 Mar 2018 19:03:28 GMT',
                                      'transfer-encoding': 'chunked',
                                      'x-amzn-requestcharacters': '272',
                                      'x-amzn-requestid': '09b95235-2f96-11e8-9d44-e93562628450'},
                      'HTTPStatusCode': 200,
                      'RequestId': '09b95235-2f96-11e8-9d44-e93562628450',
                      'RetryAttempts': 0}}


In [44]:
ipd.Audio('./pl-PL-with-lexicon-polly-example.mp3')

As we can hear, now `Maja` is doing much better with pronounciation in our example!