# 🗣️ Speech-AI-Forge Colab

👋本脚本基于 [Speech-AI-Forge](https://github.com/lenML/Speech-AI-Forge) 构建。如果此项目对你有帮助，欢迎到 github 为我们 star 支持！也欢迎提交 pr issues~

## 运行指南

1. 在菜单栏中选择 **代码执行程序**。
2. 点击 **全部运行**。

运行完成后，请在下方日志中找到如下信息：

```
Running on public URL: https://**.gradio.live
```

该链接即为您可以访问的公网地址。

> 注意：如果在安装包时提示需要重启，请选择 "否"。

## 环境部署

In [1]:
# 1. Clone the repository
!git clone https://github.com/lenML/Speech-AI-Forge

# 2. Change directory to the repository
%cd Speech-AI-Forge

# 3. Install ffmpeg and rubberband-cli
!apt-get update -y
!apt-get install -y ffmpeg rubberband-cli

# 4. Install PyTorch
!pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

# 5. Install dependencies
!pip install -r requirements.txt


Cloning into 'Speech-AI-Forge'...
remote: Enumerating objects: 5431, done.[K
remote: Counting objects: 100% (1789/1789), done.[K
remote: Compressing objects: 100% (699/699), done.[K
remote: Total 5431 (delta 1114), reused 1691 (delta 1070), pack-reused 3642 (from 1)[K
Receiving objects: 100% (5431/5431), 9.24 MiB | 6.09 MiB/s, done.
Resolving deltas: 100% (3518/3518), done.
/content/Speech-AI-Forge
Get:1 https://cloud.r-project.org/bin/linux/ubuntu jammy-cran40/ InRelease [3,626 B]
Get:2 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64  InRelease [1,581 B]
Hit:3 http://archive.ubuntu.com/ubuntu jammy InRelease
Get:4 http://security.ubuntu.com/ubuntu jammy-security InRelease [129 kB]
Get:5 http://archive.ubuntu.com/ubuntu jammy-updates InRelease [128 kB]
Get:6 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64  Packages [1,091 kB]
Hit:7 https://ppa.launchpadcontent.net/deadsnakes/ppa/ubuntu jammy InRelease
Hit:8 https://ppa.launchpad

## 下载模型

In [2]:
# @markdown ## 模型下载说明
# @markdown 大部分模型的大小接近 2GB，请确保有足够的存储空间和网络带宽。  <br/>
# @markdown **注意**：至少必须选择一个 TTS 模型。如果没有选择，将默认下载 ChatTTS。

# @markdown ### TTS 模型
# @markdown ChatTTS: [GitHub](https://github.com/2noise/ChatTTS) - A generative speech model for daily dialogue.
download_chattts = True  # @param {"type":"boolean", "placeholder":"下载 ChatTTS 模型"}
# @markdown F5-TTS: [GitHub](https://github.com/SWivid/F5-TTS) - A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching
download_f5_tts = False  # @param {"type":"boolean"}
# @markdown CosyVoice: [GitHub](https://github.com/FunAudioLLM/CosyVoice) - Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
download_cosyvoice = False  # @param {"type":"boolean"}
# @markdown FireRedTTS: [GitHub](https://github.com/FireRedTeam/FireRedTTS) - An Open-Sourced LLM-empowered Foundation TTS System
download_fire_red_tts = False  # @param {"type":"boolean"}
# @markdown FishSpeech: [GitHub](https://github.com/fishaudio/fish-speech) - Brand new TTS solution
download_fish_speech = False  # @param {"type":"boolean"}

# @markdown ### ASR 模型
# @markdown Whisper: [GitHub](https://github.com/openai/whisper) - Robust Speech Recognition via Large-Scale Weak Supervision
download_whisper = True  # @param {"type":"boolean"}

# @markdown ### Clone Voice 模型
# @markdown OpenVoice: [GitHub](https://github.com/myshell-ai/OpenVoice) - Instant voice cloning by MIT and MyShell.
download_open_voice = False  # @param {"type":"boolean"}

# @markdown ### 增强模型
# @markdown resemble-enhance: [GitHub](https://github.com/resemble-ai/resemble-enhance) - AI powered speech denoising and enhancement
download_enhancer = True  # @param {"type":"boolean"}

def download_model(command):
    print(f"Executing: {command}")
    !{command}

# 检查是否至少选择了一个 TTS 模型
if not any([download_chattts, download_fish_speech, download_cosyvoice, download_fire_red_tts, download_f5_tts]):
    print("未选择任何 TTS 模型，默认下载 ChatTTS...")
    download_chattts = True

# TTS 模型下载
if download_chattts:
    download_model("python -m scripts.dl_chattts --source huggingface")

if download_fish_speech:
    download_model("python -m scripts.downloader.fish_speech_1_2sft --source huggingface")

if download_cosyvoice:
    download_model("python -m scripts.dl_cosyvoice_instruct --source huggingface")

if download_fire_red_tts:
    download_model("python -m scripts.downloader.fire_red_tts --source huggingface")

if download_f5_tts:
    download_model("python -m scripts.downloader.f5_tts --source huggingface")
    download_model("python -m scripts.downloader.vocos_mel_24khz --source huggingface")

# ASR 模型下载
if download_whisper:
    download_model("python -m scripts.downloader.faster_whisper --source huggingface")

# Clone Voice 模型下载
if download_open_voice:
    download_model("python -m scripts.downloader.open_voice --source huggingface")

# 增强模型下载
if download_enhancer:
    download_model("python -m scripts.dl_enhance --source huggingface")

print("所有选定的模型已下载完成！")


Executing: python -m scripts.dl_chattts --source huggingface
For more details, check out https://huggingface.co/docs/huggingface_hub/main/en/guides/download#download-files-to-local-folder.
Fetching 23 files:   0% 0/23 [00:00<?, ?it/s]
Decoder.pt:   0% 0.00/104M [00:00<?, ?B/s][A

DVAE_full.pt:   0% 0.00/60.4M [00:00<?, ?B/s][A[A


.gitattributes: 100% 1.52k/1.52k [00:00<00:00, 9.60MB/s]
Fetching 23 files:   4% 1/23 [00:00<00:10,  2.02it/s]


DVAE.pt:   0% 0.00/27.7M [00:00<?, ?B/s][A[A[A
Decoder.pt:  10% 10.5M/104M [00:00<00:02, 43.3MB/s][A



Embed.safetensors:   0% 0.00/146M [00:00<?, ?B/s][A[A[A[A




README.md: 100% 1.93k/1.93k [00:00<00:00, 10.6MB/s]


DVAE_full.pt:  17% 10.5M/60.4M [00:00<00:01, 41.2MB/s][A[A




Decoder.safetensors:   0% 0.00/104M [00:00<?, ?B/s][A[A[A[A[A


DVAE.pt:  38% 10.5M/27.7M [00:00<00:00, 41.8MB/s][A[A[A
Decoder.pt:  20% 21.0M/104M [00:00<00:01, 41.5MB/s][A



Embed.safetensors:   7% 10.5M/146M [00:00<00:03, 37.6MB/s][A[A[A[A



## 运行 WebUI

In [3]:
!nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Tue_Aug_15_22:02:13_PDT_2023
Cuda compilation tools, release 12.2, V12.2.140
Build cuda_12.2.r12.2/compiler.33191640_0


In [4]:
!nvidia-smi

Thu Oct 31 09:13:55 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05             Driver Version: 535.104.05   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|   0  Tesla T4                       Off | 00000000:00:04.0 Off |                    0 |
| N/A   47C    P8              11W /  70W |      0MiB / 15360MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                    

In [None]:
!python webui.py --share --language=zh-CN

2024-10-31 09:14:15,311 - numexpr.utils - INFO - NumExpr defaulting to 2 threads.
2024-10-31 09:14:32,003 - root - INFO - New registry table added: preprocessor_classes
2024-10-31 09:14:33,265 - root - INFO - New registry table added: adaptor_classes
2024-10-31 09:14:34,050 - root - INFO - New registry table added: lid_predictor_classes
2024-10-31 09:14:34,323 - datasets - INFO - PyTorch version 2.5.0+cu121 available.
2024-10-31 09:14:34,339 - datasets - INFO - Polars version 1.7.1 available.
2024-10-31 09:14:34,340 - datasets - INFO - TensorFlow version 2.17.0 available.
2024-10-31 09:14:34,342 - datasets - INFO - JAX version 0.4.33 available.
2024-10-31 09:14:36.131692: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-10-31 09:14:36.357186: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting 

In [None]:
!python webui.py --share --language=zh-CN

2024-10-31 09:14:15,311 - numexpr.utils - INFO - NumExpr defaulting to 2 threads.
2024-10-31 09:14:32,003 - root - INFO - New registry table added: preprocessor_classes
2024-10-31 09:14:33,265 - root - INFO - New registry table added: adaptor_classes
2024-10-31 09:14:34,050 - root - INFO - New registry table added: lid_predictor_classes
2024-10-31 09:14:34,323 - datasets - INFO - PyTorch version 2.5.0+cu121 available.
2024-10-31 09:14:34,339 - datasets - INFO - Polars version 1.7.1 available.
2024-10-31 09:14:34,340 - datasets - INFO - TensorFlow version 2.17.0 available.
2024-10-31 09:14:34,342 - datasets - INFO - JAX version 0.4.33 available.
2024-10-31 09:14:36.131692: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-10-31 09:14:36.357186: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting 