Skip to content

feat(tts): add Google Cloud TTS provider and markdown stripping#100

Merged
grinev merged 5 commits intogrinev:mainfrom
georgernstgraf:feat/google-cloud-tts
Apr 24, 2026
Merged

feat(tts): add Google Cloud TTS provider and markdown stripping#100
grinev merged 5 commits intogrinev:mainfrom
georgernstgraf:feat/google-cloud-tts

Conversation

@georgernstgraf
Copy link
Copy Markdown
Contributor

Builds on #63 — adds an alternative TTS backend and ensures clean speech output.

Summary

  • Google Cloud TTS provider: New TTS_PROVIDER=google option alongside the existing OpenAI-compatible path. Uses @google-cloud/text-to-speech SDK with a singleton client. Language code is auto-extracted from the voice name (e.g. de-DE from de-DE-Neural2-B), so BOT_LOCALE doesn't need to match.
  • Markdown stripping: All text is cleaned before synthesis — removes **bold**, *italic*, `code`, fenced code blocks, links, headings, list markers, HTML tags, etc. This prevents TTS engines from reading markdown syntax aloud (asterisks, backticks, etc.).
  • 20 new tests: Covers markdown stripping, language code extraction, Google provider detection, and markdown-stripped synthesis. All 713 tests pass.
  • Updated .env.example: Includes Google Cloud setup instructions and free tier info (1M chars/month for Neural2/WaveNet, 4M/month for Standard).

Config

# OpenAI-compatible (default, no change for existing users)
TTS_PROVIDER=openai
TTS_API_URL=https://api.openai.com/v1
TTS_API_KEY=sk-...
TTS_VOICE=alloy

# Google Cloud TTS
TTS_PROVIDER=google
TTS_VOICE=en-US-Studio-O
GOOGLE_APPLICATION_CREDENTIALS=/path/to/key.json

Backward compatibility

Fully backward-compatible. Default provider is "openai", so existing deployments continue to work without any changes. The stripMarkdownForSpeech function applies to both providers.

Closes #63

Copilot AI review requested due to automatic review settings April 23, 2026 08:35
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a Google Cloud Text-to-Speech backend option and ensures TTS input is cleaned of Markdown so spoken output is natural.

Changes:

  • Added TTS_PROVIDER config with a new google provider implemented via @google-cloud/text-to-speech.
  • Introduced stripMarkdownForSpeech() to remove common Markdown/HTML markers before synthesis (applies to all providers).
  • Expanded Vitest coverage for markdown stripping, language code extraction, and provider configuration detection.

Reviewed changes

Copilot reviewed 5 out of 6 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
src/tts/client.ts Adds Google TTS provider implementation, markdown stripping, and provider routing logic.
src/config.ts Adds tts.provider config sourced from TTS_PROVIDER.
tests/tts/client.test.ts Adds tests for provider config detection, markdown stripping, and language code extraction.
package.json Adds @google-cloud/text-to-speech dependency.
package-lock.json Locks new Google Cloud dependency tree.
.env.example Documents TTS_PROVIDER=google setup and Google credentials configuration.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/tts/client.ts Outdated
Comment thread src/config.ts Outdated
Comment thread src/tts/client.ts
Comment thread src/tts/client.ts
Comment thread src/tts/client.ts Outdated
Comment thread src/tts/client.ts Outdated
Comment thread src/tts/client.ts Outdated
Copy link
Copy Markdown
Contributor Author

@georgernstgraf georgernstgraf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

works very fine for me and its on a free tier.

I did my best to resolve copilots (important) comments.

I did not manually test if openai TTS still works.

The markdown stripping does a quite good job for me, but it is still far from perfect.
Currently the voices leave out pauses where some belong. Want to investigate further.

planned by me & glm-5.1, coded by glm-5.1, reviewed by me ;)

This is such an awesome setup, thank you for this great repo!

@grinev
Copy link
Copy Markdown
Owner

grinev commented Apr 24, 2026

@georgernstgraf thanks for the PR! The overall direction looks good, but could you please address a few issues before we merge?

  1. For Google TTS, the default voice currently still resolves to alloy from config. This means TTS_PROVIDER=google without TTS_VOICE will likely send alloy to Google and fail. Please use a Google-specific default voice, for example en-US-Studio-O.

  2. extractLanguageCode() only supports voice names like de-DE-. Some Google voices use 3-letter language codes, for example cmn-CN- or yue-HK-*. Please update the extraction logic and add tests for these cases.

  3. isTtsConfigured() always returns true for Google. This can let users enable /tts even when Google credentials are not actually available. Please either check GOOGLE_APPLICATION_CREDENTIALS or document/handle ADC clearly so users do not get repeated synthesis failures at runtime.

Georg Graf added 5 commits April 24, 2026 21:28
- Add @google-cloud/text-to-speech dependency
- Support TTS_PROVIDER=google alongside existing OpenAI-compatible API
- Auto-extract language code from voice name (e.g. de-DE from de-DE-Neural2-B)
- Add stripMarkdownForSpeech() to remove **bold**, *italic*, code blocks,
  links, headings, list markers, and HTML before synthesis
- Prevents TTS from reading markdown syntax aloud (asterisks, backticks etc.)
- Remove redundant inline comments (AI slop)
- Reuse Google TTS client as singleton instead of per-request
- Export stripMarkdownForSpeech and extractLanguageCode for testability
- Add 20 tests: markdown stripping, language code extraction,
  Google provider detection, markdown-stripped synthesis
- Add TTS_PROVIDER and GOOGLE_APPLICATION_CREDENTIALS to .env.example
- Validate and normalize TTS_PROVIDER against allowlist (openai|google)
- Narrow HTML stripping regex to only match actual tags (preserve < > comparisons)
- Add timeout to Google SDK calls consistent with OpenAI path
- Explicitly convert audioContent to Buffer (handles Uint8Array from gRPC)
- Return true for Google provider in isTtsConfigured (ADC support)
- Provider-specific error messages for misconfiguration
- Add 5 Google SDK tests with mocked TextToSpeechClient
- Add _resetGoogleClient export for test singleton cleanup
@georgernstgraf georgernstgraf force-pushed the feat/google-cloud-tts branch from 373825b to eaa02df Compare April 24, 2026 19:34
@grinev
Copy link
Copy Markdown
Owner

grinev commented Apr 24, 2026

@georgernstgraf thanks for the new feature contribution!

@grinev grinev merged commit 52fc925 into grinev:main Apr 24, 2026
1 check passed
@georgernstgraf georgernstgraf deleted the feat/google-cloud-tts branch April 24, 2026 22:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]: optional TTS replies for voice/audio prompts

3 participants