Neural Dialogue Audiolizer is a ".txt to .wav converter" that turns textual dialogue (e.g. an interview, a chat) between two individuals to audio dialogue with two freely selectable voices, currently by using any of the following APIs:
- Google Cloud Text-to-Speech API (wavenet voices only)
- Amazon Polly Text-to-Speech API (neural engine voices only)
- Microsoft Azure Text-to-Speech API (neural voices only).
It was made to run in Google Colaboratory (i.e. your browser), using your Google Drive as data source and storage.
Source text | Google Cloud TTS | Amazon Polly TTS | Microsoft Azure TTS |
---|---|---|---|
gpt-3_chat-1.txt | WAV (loser) | WAV | WAV (winner) |
Access with necessary access keys is required to use any of the provided TTS APIs. More information on obtaining access:
- to Google Cloud TTS API: Before you begin
- to Amazon Polly TTS API: AWS Account and Access Keys
- to Microsoft Azure TTS API: Create the Azure resource
Note that neural voices are available only in specific regions in all of these services. Select location accordingly when enabling the service/API where necessary.
Note that costs may apply. At the time of writing this, to the best of my knowledge, account creation to all of these services as well as limited monthly usage of these TTS APIs is free of charge, even if billing/credit card information is already required upon registration. You should also be aware that each line in each text file you audiolize, consumes one TTS API call. TODO: consume only 2 API calls and slice+merge returned audio files in Colab.
Input should be path to a .txt file located in your Google Drive, containing the dialogue in one of the following formats, with no other text. If your input material is a copy-paste from the interwebs, make sure to clean it up first to strictly follow one of these formats.
question_and_answer
expects an empty line between every time speaker changes. See exampledialogue_with_names
expectsName:
(e.g. John: Hello Bob! How are you?) every time speaker changes. Speaker is changed despite the name in the beginning, i.e. if there are two consecutive lines beginning with John:, the notebook will still interpret the second as Bob, and your result is messed up. This will be improved in the distant future, perhaps. See example
This notebook has only English and Finnish voices by default. To add other languages, add the correct language names to p1_voice
and p2_voice
menus from Google Cloud TTS voice list, Amazon Polly TTS voice list or Microsoft Azure TTS voice list