这是一个为 SillyTavern(酒馆)开发的文本转语音(TTS)插件系统。
- 多种 TTS 提供商支持:支持 OpenAI、ElevenLabs、Edge TTS 等多种 TTS 服务
- 智能消息读取:自动读取聊天消息中的 mes_text内容进行语音合成
- 悬浮按钮控制:提供快速访问的悬浮按钮,点击即可朗读最新消息
- 多语音支持:为不同角色分配不同的语音
- HTML 标签自动过滤:智能去除消息中的 HTML 标签,确保语音合成质量
- 智能文本提取:从 DOM 元素 div.mes_text中直接读取消息内容,去除所有 HTML 标签
- 悬浮控制按钮:页面右下角的悬浮按钮,点击可快速朗读最新消息
- 在 TTS 设置中选择您喜欢的 TTS 提供商
- 配置 API 端点和密钥(如需要)
- 为角色分配语音
- 位置:页面右下角的紫色圆形按钮,带有音量图标
- 功能:
- 点击按钮:自动朗读最新的聊天消息
- 播放时再次点击:停止当前播放
- 播放时按钮会变为粉色并有脉冲动画效果
 
- 智能处理:
- 自动从 .mes_text元素提取文本
- 智能去除所有 HTML 标签(<p>,<br>等)
- 只朗读纯净的文本内容
 
- 自动从 
- 状态提示:
- 通过 toastr 通知显示操作状态
- 按钮颜色和动画反映播放状态
 
- 端点配置:设置您的 TTS API 端点
- 模型选择:选择 TTS 模型(如 tts-1)
- 可用语音:配置可用的语音列表(逗号分隔)
- 语速控制:调整语音播放速度(0.25-4.0)
系统会从 div.mes_text 中提取文本内容:
- 查找页面上的消息块(.mes_block)
- 定位其中的 .mes_text元素
- 提取文本内容并移除所有 HTML 标签(包括 <p>,<br>等)
- 将清理后的文本发送给 TTS 引擎
使用 textContent 属性自动去除所有 HTML 标签,确保:
- 无 <p>段落标签
- 无 <br>换行标签
- 无其他任何 HTML 格式化标签
- 保留纯文本内容和自然的空格
新增了 extractTextFromMesBlock() 方法:
// 从 DOM 中提取最新消息的 mes_text 内容
// 使用 querySelector 定位元素
// 使用 textContent 自动去除 HTML 标签修改了 fetchTtsGeneration() 方法:
- 在发送 API 请求前,先从 DOM 提取文本
- 如果提取成功,使用提取的文本;否则回退到原始的 inputText 参数
- 确保向后兼容性,不会破坏现有功能
新增功能:
- extractLatestMesText()- 提取最新消息文本的通用函数
- onFloatingButtonClick()- 处理悬浮按钮点击事件
- addFloatingButton()- 创建并添加悬浮按钮到页面
交互逻辑:
- 点击按钮检查 TTS 是否启用
- 提取最新消息的纯文本内容
- 如果正在播放,停止播放;否则开始新的播放
- 添加视觉反馈(按钮颜色变化和动画)
- 播放结束自动移除播放状态
悬浮按钮样式特点:
- 固定定位:position: fixed在右下角
- 渐变背景:紫色渐变,播放时切换为粉色渐变
- 交互反馈:
- 悬停时放大(scale(1.1))
- 点击时缩小(scale(0.95))
- 播放时脉冲动画
 
- 悬停时放大(
- 高层级:z-index: 9999确保始终可见
- 响应式设计:圆形按钮,60x60px
- ✅ 所有代码包含中文注释,便于理解
- ✅ 遵循最小修改原则,不破坏现有功能
- ✅ 添加了详细的 JSDoc 注释
- ✅ 包含错误处理和边界情况检查
- ✅ 通过 linter 检查,无语法错误
- ✅ 向后兼容,保持原有 API 接口不变
Because I don't know how, or if you can, and/or maybe I am just too lazy to implement interfaces in JS, here's the requirements of a provider that the extension needs to operate.
Exported for use in extension index.js, and added to providers list in index.js
- generateTts(text, voiceId)
- fetchTtsVoiceObjects()
- onRefreshClick()
- checkReady()
- loadSettings(settingsObject)
- settings field
- settingsHtml field
- previewTtsVoice()
- separator field
- processText(text)
- dispose()
Must return audioData.type in ['audio/mpeg', 'audio/wav', 'audio/x-wav', 'audio/wave', 'audio/webm']
Must take text to be rendered and the voiceId to identify the voice to be used
Required. Used by the TTS extension to get a list of voice objects from the provider. Must return an list of voice objects representing the available voices.
- name: a friendly user facing name to assign to characters. Shows in dropdown list next to user.
- voice_id: the provider specific id of the voice used in fetchTtsGeneration() call
- preview_url: a URL to a local audio file that will be used to sample voices
- lang: OPTIONAL language string
Required. Must return a single voice object matching the provided voiceName. The voice object must have the following at least:
- name: a friendly user facing name to assign to characters. Shows in dropdown list next to user.
- voice_id: the provider specific id of the voice used in fetchTtsGeneration() call
- preview_url: a URL to a local audio file that will be used to sample voices
- lang: OPTIONAL language indicator
Required. Users click this button to reconnect/reinit the selected provider. Responds to the user clicking the refresh button, which is intended to re-initialize the Provider into a working state, like retrying connections or checking if everything is loaded.
Required. Return without error to let TTS extension know that the provider is ready. Return an error to block the main TTS extension for initializing the provider and UI. The error will be put in the TTS extension UI directly.
Required. Handle the input settings from the TTS extension on provider load. Put code in here to load your provider settings.
Required, used for storing any provider state that needs to be saved.
Anything stored in this field is automatically persisted under extension_settings[providerName] by the main extension in saveTtsProviderSettings(), as well as loaded when the provider is selected in loadTtsProvider(provider).
TTS extension doesn't expect any specific contents.
Required, injected into the TTS extension UI. Besides adding it, not relied on by TTS extension directly.
Optional. Function to handle playing previews of voice samples if no direct preview_url is available in fetchTtsVoiceObjects() response
Optional.
Used when narrate quoted text is enabled.
Defines the string of characters used to introduce separation between between the groups of extracted quoted text sent to the provider. The provider will use this to introduce pauses by default using ...
Optional. A function applied to the input text before passing it to the TTS generator. Can be async.
Optional. Function to handle cleanup of provider resources when the provider is switched.