# 🗣️ Speech-AI-Forge Colab

👋本脚本基于 [Speech-AI-Forge](https://github.com/lenML/Speech-AI-Forge) 构建。如果此项目对你有帮助，欢迎到 github 为我们 star 支持！也欢迎提交 pr issues~

## 运行指南

1. 在菜单栏中选择 **代码执行程序**。
2. 点击 **全部运行**。

运行完成后，请在下方日志中找到如下信息：

```
Running on public URL: https://**.gradio.live
```

该链接即为您可以访问的公网地址。

> 注意：如果在安装包时提示需要重启，请选择 "否"。

## 环境部署

In [1]:
%%capture
# Skip restarting message in Colab
import sys; modules = list(sys.modules.keys())
for x in modules: sys.modules.pop(x) if "PIL" in x or "google" in x else None

# 1. Clone the repository
!git clone https://github.com/lenML/Speech-AI-Forge

# 2. Change directory to the repository
%cd Speech-AI-Forge

# 3. Install ffmpeg and rubberband-cli
!apt-get update -y
!apt-get install -y ffmpeg rubberband-cli

# 4. Install dependencies
%pip install -r requirements.txt

## 下载模型

In [None]:
import os

from scripts import dl_chattts, dl_enhance
from scripts.downloader import (
    fish_speech_1_4,
    cosyvoice2,
    fire_red_tts,
    vocos_mel_24khz,
    faster_whisper,
    open_voice,
    f5_tts_v1,
    index_tts,
    spark_tts,
)

# @markdown ## 模型下载说明
# @markdown 大部分模型的大小接近 2GB，请确保有足够的存储空间和网络带宽。  <br/>
# @markdown **注意**：至少必须选择一个 TTS 模型。如果没有选择，将默认下载 `CosyVoice` 。

# @markdown ### Hugging Face Token (可选)
# @markdown 部分模型可能需要配置 Hugging Face Token 才能下载。 如果您需要使用这些模型,请在此处输入您的 Token。您可以从 [Hugging Face](https://huggingface.co/settings/tokens) 获取您的 Token.
HF_TOKEN = ""  # @param {type:"string"}

if HF_TOKEN:
    os.environ["HF_TOKEN"] = HF_TOKEN  # 设置 "HF_TOKEN" 环境变量
    print("HF_TOKEN 环境变量已配置。")  # Helpful feedback
else:
    print("未提供 HF_TOKEN, 跳过环境变量配置.")  # Good to know why

# @markdown ### TTS 模型
# @markdown CosyVoice: [GitHub](https://github.com/FunAudioLLM/CosyVoice) - Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
download_cosyvoice = True  # @param {"type":"boolean"}

# @markdown FishSpeech: [GitHub](https://github.com/fishaudio/fish-speech) - Brand new TTS solution
download_fish_speech = False  # @param {"type":"boolean"}

# @markdown Index-TTS: [GitHub](https://github.com/index-tts/index-tts) - An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System
download_index_tts = False  # @param {"type":"boolean"}

# @markdown Spark-TTS: [GitHub](https://github.com/SparkAudio/Spark-TTS) - Spark-TTS Inference
download_spark_tts = True  # @param {"type":"boolean"}

# @markdown ChatTTS: [GitHub](https://github.com/2noise/ChatTTS) - A generative speech model for daily dialogue.
download_chattts = True  # @param {"type":"boolean"}

# @markdown F5-TTS: [GitHub](https://github.com/SWivid/F5-TTS) - A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching
download_f5_tts = False  # @param {"type":"boolean"}

# @markdown FireRedTTS: [GitHub](https://github.com/FireRedTeam/FireRedTTS) - An Open-Sourced LLM-empowered Foundation TTS System
download_fire_red_tts = False  # @param {"type":"boolean"}

# @markdown ### ASR 模型
# @markdown Whisper: [GitHub](https://github.com/openai/whisper) - Robust Speech Recognition via Large-Scale Weak Supervision
download_whisper = True  # @param {"type":"boolean"}

# @markdown ### Clone Voice 模型
# @markdown OpenVoice: [GitHub](https://github.com/myshell-ai/OpenVoice) - Instant voice cloning by MIT and MyShell.
download_open_voice = True  # @param {"type":"boolean"}

# @markdown ### 增强模型
# @markdown resemble-enhance: [GitHub](https://github.com/resemble-ai/resemble-enhance) - AI powered speech denoising and enhancement
download_enhancer = True  # @param {"type":"boolean"}


# 检查是否至少选择了一个 TTS 模型
if not any(
    [
        download_chattts,
        download_fish_speech,
        download_cosyvoice,
        download_fire_red_tts,
        download_f5_tts,
        download_index_tts,
        download_spark_tts,
    ]
):
    print("未选择任何 TTS 模型，默认下载 CosyVoice...")
    download_cosyvoice = True

dl_source = "huggingface"

# TTS 模型下载
if download_chattts:
    print("下载 ChatTTS...")
    dl_chattts.ChatTTSDownloader()(source=dl_source)
    print("下载 ChatTTS, 完成")

if download_fish_speech:
    print("下载 FishSpeech...")
    fish_speech_1_4.FishSpeech14Downloader()(source=dl_source)
    print("下载 FishSpeech, 完成")

if download_cosyvoice:
    print("下载 CosyVoice...")
    cosyvoice2.CosyVoice2Downloader()(source=dl_source)
    print("下载 CosyVoice, 完成")

if download_fire_red_tts:
    print("下载 FireRedTTS...")
    fire_red_tts.FireRedTTSDownloader()(source=dl_source)
    print("下载 FireRedTTS, 完成")

if download_f5_tts:
    print("下载 F5TTS...")
    f5_tts_v1.F5TTSV1Downloader()(source=dl_source)
    vocos_mel_24khz.VocosMel24khzDownloader()(source=dl_source)
    print("下载 F5TTS, 完成")

if download_index_tts:
    print("下载 IndexTTS...")
    index_tts.IndexTTSDownloader()(source=dl_source)
    print("下载 IndexTTS, 完成")

if download_spark_tts:
    print("下载 SparkTTS...")
    spark_tts.SparkTTSDownloader()(source=dl_source)
    print("下载 SparkTTS, 完成")

# ASR 模型下载
if download_whisper:
    print("下载 Whisper...")
    faster_whisper.FasterWhisperDownloader()(source=dl_source)
    print("下载 Whisper, 完成")

# Clone Voice 模型下载
if download_open_voice:
    print("下载 OpenVoice...")
    open_voice.OpenVoiceDownloader()(source=dl_source)
    print("下载 OpenVoice, 完成")

# 增强模型下载
if download_enhancer:
    print("下载 ResembleEnhance...")
    dl_enhance.ResembleEnhanceDownloader()(source=dl_source)
    print("下载 ResembleEnhance, 完成")

print("所有选定的模型已下载完成！")

## 运行 WebUI

In [3]:
!nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Thu_Jun__6_02:18:23_PDT_2024
Cuda compilation tools, release 12.5, V12.5.82
Build cuda_12.5.r12.5/compiler.34385749_0


In [4]:
!nvidia-smi

Wed Aug 13 02:17:39 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  Tesla T4                       Off |   00000000:00:04.0 Off |                    0 |
| N/A   39C    P8              9W /   70W |       0MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                

In [None]:
!python webui.py --share --language=zh-CN

2025-08-13 02:17:46,095 - numexpr.utils - INFO - NumExpr defaulting to 2 threads.
2025-08-13 02:17:54,024 - root - INFO - new registry table has been added: preprocessor_classes
2025-08-13 02:17:55,273 - root - INFO - new registry table has been added: adaptor_classes
2025-08-13 02:17:57,755 - root - INFO - new registry table has been added: lid_predictor_classes
2025-08-13 02:18:03.577152: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1755051483.854821    5830 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1755051483.933838    5830 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1755051484.503168    5830 computation_placer.cc:177] comput