Add support for voice styles to Text-to-Speech #1182

balloob · 2025-01-14T04:14:22Z

balloob
Jan 14, 2025
Maintainer

⚠️ Note: Proposal withdrawn and not implemented. After consideration with the voice team, we decided to just extend the available voices instead of complicating the voice identifier with a 3rd property.

Context

Text-to-Speech models can often generate the voice in different styles. Happy, friendly, angry, sad etc. Home Assistant is currently only able to expose a single style for each voice.

The Text-to-Speech entities currently allow listing the supported languages, and per language get the supported voices (docs).

The selected voice is passed as the voice key in the options dictionary when calling the tts.speak action. We have a UI to make this easy to configure in the media browser.

Decision

We extend the Voice object that is returned from the async_get_supported_voices method.

diff --git a/homeassistant/components/tts/models.py b/homeassistant/components/tts/models.py
index 2d693571a0f..0193f955646 100644
--- a/homeassistant/components/tts/models.py
+++ b/homeassistant/components/tts/models.py
@@ -9,3 +9,4 @@ class Voice:
 
     voice_id: str
     name: str
+    variants: list[str]

Variants can only be picked if ATTR_VARIANT is part of the supported_options property.

The variant is passed in the options dict passed to async_speak. A voice that has variants available can still be used without a variant specified, it is up to the integration to use a default variant.

Consequences

The number of available voices/styles that a user can choose from for Text-to-Speech providers will greatly increase.

Example integrations that will benefit:

Piper via Wyoming (examples and pick US English, arctic voice to see 18 different styles
Home Assistant Cloud (Azure available styles)

Alternatives

As an alternative, we could list all styles of a voice as their own voice.

For example, we would list AmyNeural:friendly, AmyNeural:sad etc. The downside is that this will result in very long lists and difficult to browse.

Updates

Jan 13, 2025: original proposal with async_get_supported_styles
Apr 22, 2025: updated to extend object returned from async_get_supported_voices

Answered by frenck

Apr 23, 2025

This is exactly as suggested and pre-approved. 👍

So, with that, this is a go 🚀

../Frenck

View full answer

AJediIAm · 2025-01-14T07:11:40Z

AJediIAm
Jan 14, 2025

For clarification: is the provided style set in the configuration of the TTS in voice assistant settings or can it also be set as part of the action to augment a message?

It would be nice to include a style as an parameter.

1 reply

AJediIAm Jan 14, 2025

Using voice styles as parameters will allow us to announce the weather in a sad voice when it's raining on a workday and a happy voice when Venus is visible in the night sky.

tetele · 2025-01-14T07:18:24Z

tetele
Jan 14, 2025

What can this be used for? Why do we need it?

Piper via Wyoming (examples and pick US English, arctic voice to see 18 different styles

Those seem to be 18 different voices, not styles. I mean OK, technically they may be styles, but from an application standpoint, they can't really be used as such.

Home Assistant Cloud (Azure available styles)

That's a test env on your tenant which can't be accessed by anyone who doesn't have the right.

0 replies

frenck · 2025-02-19T21:22:12Z

frenck
Feb 19, 2025
Maintainer

We have discussed this one in the architectural core meeting last week.

The idea/concept is OK to add. However, we think it should be part of the objects of voices we already return. This existing dataclass could, in our opinion, be extended with a property that holds these styles.

Also: Maybe use "variants" or "moods" instead of styles? 🤷

2 replies

balloob Apr 22, 2025
Maintainer Author

Updated proposal to add a variants property to the Voice object.

frenck Apr 23, 2025
Maintainer

This is exactly as suggested and pre-approved. 👍

So, with that, this is a go 🚀

../Frenck

Answer selected by frenck

noxhirsch · 2025-02-20T10:29:06Z

noxhirsch
Feb 20, 2025

In addition to the voice styles, it would also be useful to be able to set the voice rate/speed.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add support for voice styles to Text-to-Speech #1182

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 4 comments 3 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Add support for voice styles to Text-to-Speech #1182

Uh oh!

Uh oh!

balloob Jan 14, 2025 Maintainer

Context

Decision

Consequences

Alternatives

Updates

Replies: 4 comments · 3 replies

Uh oh!

AJediIAm Jan 14, 2025

Uh oh!

AJediIAm Jan 14, 2025

Uh oh!

tetele Jan 14, 2025

Uh oh!

frenck Feb 19, 2025 Maintainer

Uh oh!

balloob Apr 22, 2025 Maintainer Author

Uh oh!

frenck Apr 23, 2025 Maintainer

Uh oh!

noxhirsch Feb 20, 2025

balloob
Jan 14, 2025
Maintainer

Replies: 4 comments 3 replies

AJediIAm
Jan 14, 2025

tetele
Jan 14, 2025

frenck
Feb 19, 2025
Maintainer

balloob Apr 22, 2025
Maintainer Author

frenck Apr 23, 2025
Maintainer

noxhirsch
Feb 20, 2025