Skip to content

Commit

Permalink
Parler TTS (#2114)
Browse files Browse the repository at this point in the history
CVS-141596
  • Loading branch information
aleksandr-mokrov committed Jun 13, 2024
1 parent b6728f5 commit 59b1b2f
Show file tree
Hide file tree
Showing 4 changed files with 613 additions and 1 deletion.
2 changes: 2 additions & 0 deletions .ci/spellcheck/.pyspelling.wordlist.txt
Original file line number Diff line number Diff line change
Expand Up @@ -401,6 +401,7 @@ LSTM
LSTMs
Luo
LVLM
Lyth
macOS
Magika
Mahalanobis
Expand Down Expand Up @@ -543,6 +544,7 @@ panoptic
parallelized
parameterization
parametrize
Parler
parsers
perceptron
Patil
Expand Down
31 changes: 31 additions & 0 deletions notebooks/parler-tts-text-to-speech/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
# Text-to-speech (TTS) with Parler-TTS and OpenVINO™

Parler-TTS is a lightweight text-to-speech (TTS) model that can generate high-quality, natural sounding speech in the style of a given speaker (gender, pitch, speaking style, etc). It is a reproduction of work from the paper [Natural language guidance of high-fidelity text-to-speech with synthetic annotations](https://www.text-description-to-speech.com/) by Dan Lyth and Simon King, from Stability AI and Edinburgh University respectively.

![](https://images.squarespace-cdn.com/content/v1/657816dfbefe0533e8a69d9a/30c96e25-acc5-4019-acdd-648da6142c4c/architecture_v3.png?format=2500w)

Text-to-speech models trained on large-scale datasets have demonstrated impressive in-context learning capabilities and naturalness. However, control of speaker identity and style in these models typically requires conditioning on reference speech recordings, limiting creative applications. Alternatively, natural language prompting of speaker identity and style has demonstrated promising results and provides an intuitive method of control. However, reliance on human-labeled descriptions prevents scaling to large datasets.

This work bridges the gap between these two approaches. The authors propose a scalable method for labeling various aspects of speaker identity, style, and recording conditions. This method then is applied to a 45k hour dataset, which is used to train a speech language model. Furthermore, the authors propose simple methods for increasing audio fidelity, significantly outperforming recent work despite relying entirely on found data.


[GitHub repository](https://github.com/huggingface/parler-tts)

[HuggingFace page](https://huggingface.co/parler-tts)


## Notebook Contents

This notebook demonstrates how to convert and run the Parler TTS model using OpenVINO.

Notebook contains the following steps:
1. Load the original model and inference.
2. Convert the model to OpenVINO IR.
3. Compiling models and inference.
4. Interactive inference.

## Installation instructions

This is a self-contained example that relies solely on its own code.</br>
We recommend running the notebook in a virtual environment. You only need a Jupyter server to start.
For details, please refer to [Installation Guide](../../README.md).
579 changes: 579 additions & 0 deletions notebooks/parler-tts-text-to-speech/parler-tts-text-to-speech.ipynb

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion selector/src/shared/notebook-tags.js
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ export const TASKS = /** @type {const} */ ({
TEXT_TO_VIDEO: 'Text-to-Video',
VIDEO_TO_TEXT: 'Video-to-Text',
TEXT_TO_AUDIO: 'Text-to-Audio',
TEXT_TO_SPEECH: 'Text-to-Speech',
AUDIO_TO_TEXT: 'Audio-to-Text',
VISUAL_QUESTION_ANSWERING: 'Visual Question Answering',
IMAGE_CAPTIONING: "Image Captioning",
Expand Down Expand Up @@ -61,7 +62,6 @@ export const TASKS = /** @type {const} */ ({
AUDIO_GENERATION: 'Audio Generation',
AUDIO_CLASSIFICATION: 'Audio Classification',
VOICE_ACTIVITY_DETECTION: 'Voice Activity Detection',
AUDIO_CLASSIFICATION: "Audio Classification",
},
OTHER: {
KNOWLEDGE_REPRESENTATION: 'Knowledge Representation',
Expand Down

0 comments on commit 59b1b2f

Please sign in to comment.