Is there a way to create consistent voices? #11

BigArty · 2024-04-11T09:05:38Z

I want to make an app that would read long texts in chunks. For this I need to get the same voice for the same speaker prompt. Now I get similar but still not the same voices each generation. Is it possible to somehow fix the voice?

sanchit-gandhi · 2024-04-11T11:10:20Z

This is a very valid feature request @BigArty. What we're thinking of doing is fine-tuning the model on a single speaker to fix the speaker's voice, and then controlling a subset of features (speed, tone, background noise) through the text prompt. Would this work for you?

baimu1010 · 2024-04-11T12:31:48Z

+1,I need it too

BigArty · 2024-04-11T14:34:54Z

@sanchit-gandhi Yes, that would be great! The ability to control the tone and speed is also very convenient.

boveyking · 2024-04-11T17:22:12Z

really good project. thank you gugys. It is a must to be able use voice_id(or speaker id whatever) to produce consistent voice in order to make it useful in real world.

sladec · 2024-04-12T00:55:49Z

Nice work guys! The ability to generate speech from the same speaker would be really useful.
Accent control will also be a +1

adamfils · 2024-04-14T03:54:33Z

@sanchit-gandhi
It would also be nice to use seeds to get consistent voices.

bkutasi · 2024-04-15T13:24:56Z

In my opinion this is one most important features of a tts so would love to see this integrated,

sanchit-gandhi · 2024-04-19T14:38:06Z

We have a first single-speaker fine-tuned checkpoint: https://huggingface.co/ylacombe/parler-tts-mini-jenny-30H

Could be useful if you want to have a specific voice, in this case Jenny (she's Irish ☘️). Usage is more or less the same as Parler-TTS v0.1, just specify they keyword “Jenny” in the voice description:

import torch
from parler_tts import ParlerTTSForConditionalGeneration
from transformers import AutoTokenizer, set_seed
import soundfile as sf

device = "cuda:0" if torch.cuda.is_available() else "cpu"

model = ParlerTTSForConditionalGeneration.from_pretrained("ylacombe/parler-tts-mini-jenny-30H").to(device)
tokenizer = AutoTokenizer.from_pretrained("ylacombe/parler-tts-mini-jenny-30H")

prompt = "Hey, how are you doing today? My name is Jenny, and I'm here to help you with any questions you have."
description = "Jenny speaks at an average pace with an animated delivery in a very confined sounding environment with clear audio quality."

input_ids = tokenizer(description, return_tensors="pt").input_ids.to(device)
prompt_input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(device)

set_seed(42)
# specify min length to avoid 0-length generations
generation = model.generate(input_ids=input_ids, prompt_input_ids=prompt_input_ids, min_length=10)
audio_arr = generation.cpu().numpy().squeeze()
sf.write("parler_tts_out.wav", audio_arr, model.config.sampling_rate)

baimu1010 · 2024-04-21T15:49:08Z

Can I control the output of the model in the same way as I set up a seed in a text2image?

Kredithaichen · 2024-04-24T11:02:54Z

Having a checkpoint for a consistent voice sounds great! A bit of a dummy-question on my end, though, @sanchit-gandhi : How can I use this new checkpoint? Where would I need to place the .safetensors file in order to load it with the script you provided? Is there a tutorial or 'getting-started' document out there?

furqan4545 · 2024-04-27T18:53:51Z

Sanchit, thank you so much for the amazing work. You guys are true face of open source contribution. I had the same feature request, which is consistent speaker id and being able to generate long form speeches with consistent voice id. I mean i don't want to break my text into chunks and then feed it to model. It will be great if the model can take of chunking itself and generate long form speech. Plus controlling emotions and speed through text is amazing feature.

atb29 · 2024-06-02T12:06:26Z

Sanchit, thank you so much for the amazing work. You guys are true face of open source contribution. I had the same feature request, which is consistent speaker id and being able to generate long form speeches with consistent voice id. I mean i don't want to break my text into chunks and then feed it to model. It will be great if the model can take of chunking itself and generate long form speech. Plus controlling emotions and speed through text is amazing feature.

What is the updates about that? I have the same issue as you, i don't want to chunk the text and then feed it to the model and it gives a non consistent voice with different tone and rhytme... It would be great if the model chunk it up and then use a consistent voice throughout the whole text.

mkiol mentioned this issue Apr 16, 2024

Enhancement: Parler-TTS, a new open source model with steerable voice characteristics mkiol/dsnote#122

Open

phirsch mentioned this issue Apr 16, 2024

[Enhancement] New fully open source TTS with steerable voice characteristics FlorianEagox/WeeaBlind#21

Open

bkutasi mentioned this issue Apr 17, 2024

Model stumbling on its words #24

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is there a way to create consistent voices? #11

Is there a way to create consistent voices? #11

BigArty commented Apr 11, 2024

sanchit-gandhi commented Apr 11, 2024

baimu1010 commented Apr 11, 2024 •

edited

BigArty commented Apr 11, 2024

boveyking commented Apr 11, 2024

sladec commented Apr 12, 2024 •

edited

adamfils commented Apr 14, 2024

bkutasi commented Apr 15, 2024

sanchit-gandhi commented Apr 19, 2024

baimu1010 commented Apr 21, 2024

Kredithaichen commented Apr 24, 2024

furqan4545 commented Apr 27, 2024

atb29 commented Jun 2, 2024

Is there a way to create consistent voices? #11

Is there a way to create consistent voices? #11

Comments

BigArty commented Apr 11, 2024

sanchit-gandhi commented Apr 11, 2024

baimu1010 commented Apr 11, 2024 • edited

BigArty commented Apr 11, 2024

boveyking commented Apr 11, 2024

sladec commented Apr 12, 2024 • edited

adamfils commented Apr 14, 2024

bkutasi commented Apr 15, 2024

sanchit-gandhi commented Apr 19, 2024

baimu1010 commented Apr 21, 2024

Kredithaichen commented Apr 24, 2024

furqan4545 commented Apr 27, 2024

atb29 commented Jun 2, 2024

baimu1010 commented Apr 11, 2024 •

edited

sladec commented Apr 12, 2024 •

edited