Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Piper TTS voices make a "puffing" sound instead of pronouncing the letter "L" as well as some punctuation marks. #27

Closed
mikebayus opened this issue Nov 1, 2023 · 4 comments

Comments

@mikebayus
Copy link

Hi,

I haven't reported this as I have used the Piper TTS Voices Add-on for NVDA as I thought that others might have already reported this, but as I found no other issue I am reporting this now.

When I use left and right arrows to scrole character by character, the letter "l makes a "puffing sound" rather than pronouncing the letter "l".

Try the word: "alleluia".

Some punctuation marks do this as well.

As I write this, I just had my Piper voice read the lines that had just the letter "l" in them and my voice parces the letter "l in the case of reading a sentence, it's just when using left and right arrows to scrole one letter at a time.

@rmcpantoja
Copy link
Contributor

Hi,
Unfortunately, unclear pronunciations are a defect of the VITS model. In this case, you could add slightly shorter audios to the dataset, and train it a little more. It can surely improve efficiency even for reading extremely short texts. It should be noted that efficiency is more notable in medium quality models.

@mikebayus
Copy link
Author

mikebayus commented Nov 2, 2023 via email

@mush42
Copy link
Owner

mush42 commented Nov 2, 2023

Hi @mikebayus
Unfortunately, we can do nothing about this.
This is not an issue of the add-on per se, it is an issue of the underlying model and the dataset used to train it.

@mush42 mush42 closed this as not planned Won't fix, can't repro, duplicate, stale Nov 2, 2023
@mush42
Copy link
Owner

mush42 commented Nov 2, 2023

@mikebayus
Most of these voices are not designed with screen reader use in mind.
The add-on itself can drive any piper-compatible model. I see no reason that our community comes together and create a dataset for screen readers and train a voice based on it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants