New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Automatically convert TTS audio to MP3 on demand #102814
Conversation
Hey there @balloob, mind taking a look at this pull request as it has been labeled with an integration ( Code owner commandsCode owners of
|
Hey there @home-assistant/core, @pvizeli, mind taking a look at this pull request as it has been labeled with an integration ( Code owner commandsCode owners of
|
TODO: The TTS memory and file caches need to know about the multiple formats available. At the moment, MP3 replaces the cache after conversion. This means the original files will not be accessible without probing the cache directory. |
88c0629
to
6eaf9bc
Compare
if proc.returncode != 0: | ||
_LOGGER.error(stderr.decode()) | ||
raise RuntimeError( | ||
f"Unexpected error while running ffmpeg with arguments: {command}. See log for details." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please break long strings around max 88 characters per line.
Breaking change
The
ATTR_AUDIO_OUTPUT
attribute is deprecated. This previously told the TTS system what audio format to generate. It has been superseded byATTR_PREFERRED_FORMAT
(see below).Unless
ATTR_PREFERRED_FORMAT
is given, TTS audio will always be converted to (or kept in) MP3 format.All TTS audio generation is now non-blocking. A media source id/URL will be returned immediately while TTS audio is generated in the background. Resolving the media/fetching the URL will block until generation is finished.
Proposed change
Different TTS systems produce audio in different formats, some not compatible with many media players. Some TTS systems support
ATTR_AUDIO_OUTPUT
format to change the output format, there is no guarantee that the TTS system can generate the requested audio format. Wyoming, for example, can only generate WAV files.This PR adds several things to TTS:
ATTR_PREFERRED_FORMAT
option lets the caller select a different audio format than what the TTS natively generates, such as "wav". Unless provided, it defaults to MP3.ATTR_PREFERRED_SAMPLE_RATE
andATTR_PREFERRED_SAMPLE_CHANNELS
, allow the caller to control the exact details of the final audio. This is required for ESPHome to stream audio to speakers.Lastly, the ESPHome voice assistant code has been updated to request 16 Khz 16-bit mono WAV audio when it will be streamed back to the client. This should now work with any TTS system.
Type of change
Additional information
Checklist
black --fast homeassistant tests
)If user exposed functionality or configuration variables are added/changed:
If the code communicates with devices, web services, or third-party tools:
Updated and included derived files by running:
python3 -m script.hassfest
.requirements_all.txt
.Updated by running
python3 -m script.gen_requirements_all
..coveragerc
.To help with the load of incoming pull requests: