Voices in the Machine - AI Speech Generation

From monotone march to expressive symphony, AI-powered voices whisper possibilities: audiobooks with the author's soul, stories narrated in forgotten tongues, and connections beyond the veil. This project covers everything you need to get started with Text-to-Speech AI, exploring its technical underpinnings, its recent advancements, and its diverse applications across various industries. We will examine the ethical considerations surrounding voice cloning and the future potential of this technology to reshape how we interact with information and create content.

Understanding Text To Speech

In the heart of the digital soundscape lies the fascinating technology of Text-to-Voice AI, where written words seamlessly transform into spoken expressions. While the final output may sound seamless, there's an intricate interplay of components working behind the scenes. Let's break down this technological symphony:

Text Pre-Processing
Text to Phoneme Conversion
Prosody Prediction
Speech Synthesis
Post Processing

Evolution of Text-to-Speech

For centuries, the quest to make machines speak sounded like robots stuck on repeat. From bellows and reeds to early digital squawks, text-to-speech was more sci-fi nightmare than technological marvel. But then came the AI symphony. Deep learning algorithms, trained on vast libraries of human voices, now generate speech so nuanced and expressive it rivals the spoken word. This newfound eloquence unlocks a treasure trove of possibilities: from empowering the visually impaired to narrating audiobooks with the author's touch, AI voices are shaping how we consume, create, and even grieve. As ethical frameworks guide its development, this technological symphony promises to reshape communication, amplify diverse voices, and weave a richer tapestry of human connection.

Text-to-Speech Tools

TTS technology as discussed earlier is not new. However, with the advancement of AI, the generated output has got a lot more natural and blurs the line between actual speech and generated speech.

There are countless tools to try out Text-to-Speech, both open-source and commercial. Among the open sources ones, here are the most widely used:

Bark: Text-Prompted Generative Audio Model
PlayHT: AI Voice Generator
HierSpeech++: The official implementation of HierSpeech++
ElevenLabs: Text to Speech & AI Voice Generator

Tutorial

Text-to-Speech using PlayHT

Let's start with the easiest way to use voice cloning and TTS - PlayHT

Visit Play ht and create a free account. The service allows you to clone a single voice for free and generate speech from text.

PlayHT allows you to generate voices from the existing voices or clone a new voice. To use the existing voices, click on the name of the voice above the text input, and you can search and select any voice you like. They have amazing voices that you can try out to narrate blocks of text you provide.

The real fun is using your own voice or a voice you want to clone. The tool allows you to do just that. Click on "Voice Cloning" and follow the simple steps provided.

Click on "Instant" to create a clone from a "30 Sec" audio recording.

Then click on "Create New Model" and select the "PlayHT 2.0" model. Now when you click the name of the voice as before you will be able to select your newly cloned voice.

Then, add your text and click "Generate Speech" or hit the Play button

Text-to-Speech using Bark

Bark is Suno's text-to-audio model that's capable of generating highly realistic speech from text. Bark goes beyond the basics, effortlessly generating natural-sounding, multilingual speech. But it doesn't stop there – it can create all sorts of audio, from music and background noise to simple sound effects. Bark even adds a human touch with nonverbal cues like laughter, sighs, and crying.

To get started, click the link to visit the Google Colab notebook.

The interface is pretty straight forward, hit the play button besides the "Cells" - this are each greyed areas that have code inside them. You can try out various voices and languages. For list of supported voices checkout the Bark's voice prompt library

Interesting feature of Bark is its ability to incorporate non-speech sounds such as laughter, sighs, music (although not great currently) ... etc

[[laughter]

[laughs]

[sighs]

[music]

[gasps]

[clears throat]

— or ... for hesitations

♪ for song lyrics

CAPITALIZATION for emphasis of a word

Two caveat about bark are although it supports voice cloning, it does not provide this feature out of the box. Another issue you might face is the limitation with the length of audio you can generate. In order to address this issues check out the below two projects

bark-with-voice-clone

bark

Other Useful Tools

Adobe podcast: Clean up the generated voices and make them even more realistic.
Mp3Cut: Online MP3 Cutter to cut out a piece of music.
Convertio: Easy tool to convert files online.

Contact

Questions? Feedback? Requests? Discord: Samej2023

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Voices in the Machine - AI Speech Generation

Understanding Text To Speech

Evolution of Text-to-Speech

Text-to-Speech Tools

Tutorial

Text-to-Speech using PlayHT

The real fun is using your own voice or a voice you want to clone. The tool allows you to do just that. Click on "Voice Cloning" and follow the simple steps provided.

Text-to-Speech using Bark

Other Useful Tools

Contact

About

Releases

Packages

License

mejbass/Voices-in-the-Machine-AI-Speech-Generation

Folders and files

Latest commit

History

Repository files navigation

Voices in the Machine - AI Speech Generation

Understanding Text To Speech

Evolution of Text-to-Speech

Text-to-Speech Tools

Tutorial

Text-to-Speech using PlayHT

The real fun is using your own voice or a voice you want to clone. The tool allows you to do just that. Click on "Voice Cloning" and follow the simple steps provided.

Text-to-Speech using Bark

Other Useful Tools

Contact

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages