# Generate an audio caption for sonification

First, you will need to install TTS if you don't already have it.

In [None]:
!pip install TTS

In [None]:
from TTS.api import TTS
from IPython.display import Audio

The default text-to-speech (tts) model used in STRAUSS is an English-language, female voice with an Irish accent: 'tts_models/en/jenny/jenny'. You can hear a sample below: 

In [None]:
Audio('tts_jenny.wav', autoplay=True)

You can choose other voices in a range of languages. To try them, use the following code:

In [None]:
# List models
TTS.list_models()

In [None]:
# Test example: the following is a US English female voice.
OUTPUT_PATH = 'tts_english.wav'
tts = TTS(model_name='tts_models/en/ljspeech/tacotron2-DDC', progress_bar=False, gpu=False)
tts.tts_to_file(text='The quick brown fox jumped over the lazy dogs.', file_path=OUTPUT_PATH)
Audio(OUTPUT_PATH, autoplay=True)

In [None]:
# Test example: the following is a German male voice.
OUTPUT_PATH = 'tts_german.wav'
tts = TTS(model_name='tts_models/de/thorsten/vits', progress_bar=False, gpu=False)
tts.tts_to_file(text='Der flinke braune Fuchs sprang über die faulen Hunde.', file_path=OUTPUT_PATH)
Audio(OUTPUT_PATH, autoplay=True)


Using punctuation helps create pauses and emphasis in the correct places. Below are two samples, identical apart from the addition of a comma in the second.

In [17]:
Audio('tts_sample.wav', autoplay=True)

In [18]:
Audio('tts_sample_with_comma.wav', autoplay=True)

Let's reset the model to our default:

In [None]:
tts = TTS(model_name='tts_models/en/jenny/jenny', progress_bar=False, gpu=False)

TTS ignores anything it doesn't recognise, such as Greek letters and some mathematical symbols. It can also struggle with multi-digit numbers. It's best to write these out long-hand. Here is an example:

In [None]:
OUTPUT_PATH = 'tts_lya.wav'
tts.tts_to_file(text="The Lyman-α resonance is 1216 Å. The Lyman alpha resonance is twelve hundred and sixteen angstroms.", file_path=OUTPUT_PATH)
Audio(OUTPUT_PATH, autoplay=True)

With any words or names that it struggles with, you can adjust the spelling to make it sound better. For instance the Italian name "Chierchia" can be spelled "Kyerkia" to get close to the correct pronunciation.

Now let's try entering a caption:

In [None]:
caption = input("Please enter your caption: ")

In [None]:
OUTPUT_PATH = 'tts_caption.wav'
tts.tts_to_file(text=caption, file_path=OUTPUT_PATH)
Audio(OUTPUT_PATH, autoplay=True)

## Sandbox