Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatically convert TTS audio to MP3 on demand #102814

Merged
merged 16 commits into from Nov 6, 2023

Conversation

synesthesiam
Copy link
Contributor

@synesthesiam synesthesiam commented Oct 25, 2023

Breaking change

The ATTR_AUDIO_OUTPUT attribute is deprecated. This previously told the TTS system what audio format to generate. It has been superseded by ATTR_PREFERRED_FORMAT (see below).

Unless ATTR_PREFERRED_FORMAT is given, TTS audio will always be converted to (or kept in) MP3 format.

All TTS audio generation is now non-blocking. A media source id/URL will be returned immediately while TTS audio is generated in the background. Resolving the media/fetching the URL will block until generation is finished.

Proposed change

Different TTS systems produce audio in different formats, some not compatible with many media players. Some TTS systems support ATTR_AUDIO_OUTPUT format to change the output format, there is no guarantee that the TTS system can generate the requested audio format. Wyoming, for example, can only generate WAV files.

This PR adds several things to TTS:

  1. A new ATTR_PREFERRED_FORMAT option lets the caller select a different audio format than what the TTS natively generates, such as "wav". Unless provided, it defaults to MP3.
  2. Two additional options, ATTR_PREFERRED_SAMPLE_RATE and ATTR_PREFERRED_SAMPLE_CHANNELS, allow the caller to control the exact details of the final audio. This is required for ESPHome to stream audio to speakers.
  3. All TTS audio generation is now non-blocking.

Lastly, the ESPHome voice assistant code has been updated to request 16 Khz 16-bit mono WAV audio when it will be streamed back to the client. This should now work with any TTS system.

Type of change

  • Dependency upgrade
  • Bugfix (non-breaking change which fixes an issue)
  • New integration (thank you!)
  • New feature (which adds functionality to an existing integration)
  • Deprecation (breaking change to happen in the future)
  • Breaking change (fix/feature causing existing functionality to break)
  • Code quality improvements to existing code or addition of tests

Additional information

Checklist

  • The code change is tested and works locally.
  • Local tests pass. Your PR cannot be merged unless tests pass
  • There is no commented out code in this PR.
  • I have followed the development checklist
  • I have followed the perfect PR recommendations
  • The code has been formatted using Black (black --fast homeassistant tests)
  • Tests have been added to verify that the new code works.

If user exposed functionality or configuration variables are added/changed:

If the code communicates with devices, web services, or third-party tools:

  • The manifest file has all fields filled out correctly.
    Updated and included derived files by running: python3 -m script.hassfest.
  • New or updated dependencies have been added to requirements_all.txt.
    Updated by running python3 -m script.gen_requirements_all.
  • For the updated dependencies - a link to the changelog, or at minimum a diff between library versions is added to the PR description.
  • Untested files have been added to .coveragerc.

To help with the load of incoming pull requests:

@home-assistant
Copy link

Hey there @balloob, mind taking a look at this pull request as it has been labeled with an integration (wyoming) you are listed as a code owner for? Thanks!

Code owner commands

Code owners of wyoming can trigger bot actions by commenting:

  • @home-assistant close Closes the pull request.
  • @home-assistant rename Awesome new title Renames the pull request.
  • @home-assistant reopen Reopen the pull request.
  • @home-assistant unassign wyoming Removes the current integration label and assignees on the pull request, add the integration domain after the command.

@home-assistant
Copy link

Hey there @home-assistant/core, @pvizeli, mind taking a look at this pull request as it has been labeled with an integration (tts) you are listed as a code owner for? Thanks!

Code owner commands

Code owners of tts can trigger bot actions by commenting:

  • @home-assistant close Closes the pull request.
  • @home-assistant rename Awesome new title Renames the pull request.
  • @home-assistant reopen Reopen the pull request.
  • @home-assistant unassign tts Removes the current integration label and assignees on the pull request, add the integration domain after the command.

@synesthesiam synesthesiam mentioned this pull request Oct 25, 2023
20 tasks
@synesthesiam synesthesiam marked this pull request as ready for review October 26, 2023 02:20
@synesthesiam synesthesiam requested review from balloob, pvizeli and a team as code owners October 26, 2023 02:20
@synesthesiam synesthesiam changed the title Add ATTR_PREFERRED_FORMAT to TTS for auto-converting audio Automatically convert TTS audio to MP3 on demand Oct 27, 2023
@synesthesiam synesthesiam marked this pull request as draft October 27, 2023 20:07
@synesthesiam
Copy link
Contributor Author

TODO: The TTS memory and file caches need to know about the multiple formats available. At the moment, MP3 replaces the cache after conversion. This means the original files will not be accessible without probing the cache directory.

@balloob balloob merged commit ae516ff into dev Nov 6, 2023
53 checks passed
@balloob balloob deleted the synestheisam-20231025-tts-autoconvert branch November 6, 2023 20:26
if proc.returncode != 0:
_LOGGER.error(stderr.decode())
raise RuntimeError(
f"Unexpected error while running ffmpeg with arguments: {command}. See log for details."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please break long strings around max 88 characters per line.

@github-actions github-actions bot locked and limited conversation to collaborators Nov 7, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants