# 🗣️ Speech-AI-Forge Colab

👋本脚本基于 [Speech-AI-Forge](https://github.com/lenML/Speech-AI-Forge) 构建。如果此项目对你有帮助，欢迎到 github 为我们 star 支持！也欢迎提交 pr issues~

## 运行指南

1. 在菜单栏中选择 **代码执行程序**。
2. 点击 **全部运行**。

运行完成后，请在下方日志中找到如下信息：

```
Running on public URL: https://**.gradio.live
```

该链接即为您可以访问的公网地址。

> 注意：如果在安装包时提示需要重启，请选择 "否"。

## 环境部署

In [2]:
# 1. Clone the repository
!git clone https://github.com/lenML/Speech-AI-Forge

# 2. Change directory to the repository
%cd Speech-AI-Forge

# 3. Install ffmpeg and rubberband-cli
!apt-get update -y
!apt-get install -y ffmpeg rubberband-cli

# 4. Install PyTorch
!pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

# 5. install onnx onnxconverter_common

!pip install onnx
!pip install onnxconverter_common

# 6. Install dependencies
!pip install -r requirements.txt



Cloning into 'Speech-AI-Forge'...
remote: Enumerating objects: 5895, done.[K
remote: Counting objects: 100% (351/351), done.[K
remote: Compressing objects: 100% (121/121), done.[K
remote: Total 5895 (delta 249), reused 242 (delta 230), pack-reused 5544 (from 1)[K
Receiving objects: 100% (5895/5895), 9.18 MiB | 13.24 MiB/s, done.
Resolving deltas: 100% (4075/4075), done.
/content/Speech-AI-Forge
Hit:1 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64  InRelease
Get:2 https://cloud.r-project.org/bin/linux/ubuntu jammy-cran40/ InRelease [3,626 B]
Hit:3 http://archive.ubuntu.com/ubuntu jammy InRelease
Get:4 http://security.ubuntu.com/ubuntu jammy-security InRelease [129 kB]
Get:5 http://archive.ubuntu.com/ubuntu jammy-updates InRelease [128 kB]
Get:6 https://r2u.stat.illinois.edu/ubuntu jammy InRelease [6,555 B]
Get:7 http://archive.ubuntu.com/ubuntu jammy-backports InRelease [127 kB]
Hit:8 https://ppa.launchpadcontent.net/deadsnakes/ppa/ubuntu jammy InRelease


Collecting pydub (from -r requirements.txt (line 4))
  Downloading pydub-0.25.1-py2.py3-none-any.whl.metadata (1.4 kB)
Collecting fastapi (from -r requirements.txt (line 5))
  Downloading fastapi-0.115.6-py3-none-any.whl.metadata (27 kB)
Collecting omegaconf (from -r requirements.txt (line 7))
  Downloading omegaconf-2.3.0-py3-none-any.whl.metadata (3.9 kB)
Collecting pypinyin (from -r requirements.txt (line 8))
  Downloading pypinyin-0.53.0-py2.py3-none-any.whl.metadata (12 kB)
Collecting vocos (from -r requirements.txt (line 9))
  Downloading vocos-0.1.0-py3-none-any.whl.metadata (4.8 kB)
Collecting vector_quantize_pytorch (from -r requirements.txt (line 11))
  Downloading vector_quantize_pytorch-1.20.11-py3-none-any.whl.metadata (29 kB)
Collecting transformers~=4.41.1 (from -r requirements.txt (line 13))
  Downloading transformers-4.41.2-py3-none-any.whl.metadata (43 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m43.8/43.8 kB[0m [31m3.9 MB/s[0m eta [36m0:00:

## 下载模型

In [2]:
# @markdown ## 模型下载说明
# @markdown 大部分模型的大小接近 2GB，请确保有足够的存储空间和网络带宽。  <br/>
# @markdown **注意**：至少必须选择一个 TTS 模型。如果没有选择，将默认下载 ChatTTS。

# @markdown ### TTS 模型
# @markdown ChatTTS: [GitHub](https://github.com/2noise/ChatTTS) - A generative speech model for daily dialogue.
download_chattts = True  # @param {"type":"boolean", "placeholder":"下载 ChatTTS 模型"}
# @markdown F5-TTS: [GitHub](https://github.com/SWivid/F5-TTS) - A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching
download_f5_tts = True  # @param {"type":"boolean"}
# @markdown CosyVoice: [GitHub](https://github.com/FunAudioLLM/CosyVoice) - Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
download_cosyvoice = True  # @param {"type":"boolean"}
# @markdown FireRedTTS: [GitHub](https://github.com/FireRedTeam/FireRedTTS) - An Open-Sourced LLM-empowered Foundation TTS System
download_fire_red_tts = True  # @param {"type":"boolean"}
# @markdown FishSpeech: [GitHub](https://github.com/fishaudio/fish-speech) - Brand new TTS solution
download_fish_speech = True  # @param {"type":"boolean"}

# @markdown ### ASR 模型
# @markdown Whisper: [GitHub](https://github.com/openai/whisper) - Robust Speech Recognition via Large-Scale Weak Supervision
download_whisper = True  # @param {"type":"boolean"}

# @markdown ### Clone Voice 模型
# @markdown OpenVoice: [GitHub](https://github.com/myshell-ai/OpenVoice) - Instant voice cloning by MIT and MyShell.
download_open_voice = True  # @param {"type":"boolean"}

# @markdown ### 增强模型
# @markdown resemble-enhance: [GitHub](https://github.com/resemble-ai/resemble-enhance) - AI powered speech denoising and enhancement
download_enhancer = True  # @param {"type":"boolean"}

def download_model(command):
    print(f"Executing: {command}")
    !{command}

# 检查是否至少选择了一个 TTS 模型
if not any([download_chattts, download_fish_speech, download_cosyvoice, download_fire_red_tts, download_f5_tts]):
    print("未选择任何 TTS 模型，默认下载 ChatTTS...")
    download_chattts = True

# TTS 模型下载
if download_chattts:
    download_model("python -m scripts.dl_chattts --source huggingface")

if download_fish_speech:
    download_model("python -m scripts.downloader.fish_speech_1_2sft --source huggingface")

if download_cosyvoice:
    download_model("python -m scripts.dl_cosyvoice_instruct --source huggingface")

if download_fire_red_tts:
    download_model("python -m scripts.downloader.fire_red_tts --source huggingface")

if download_f5_tts:
    download_model("python -m scripts.downloader.f5_tts --source huggingface")
    download_model("python -m scripts.downloader.vocos_mel_24khz --source huggingface")

# ASR 模型下载
if download_whisper:
    download_model("python -m scripts.downloader.faster_whisper --source huggingface")

# Clone Voice 模型下载
if download_open_voice:
    download_model("python -m scripts.downloader.open_voice --source huggingface")

# 增强模型下载
if download_enhancer:
    download_model("python -m scripts.dl_enhance --source huggingface")

print("所有选定的模型已下载完成！")


Executing: python -m scripts.dl_chattts --source huggingface
/usr/bin/python3: No module named scripts.dl_chattts
Executing: python -m scripts.downloader.fish_speech_1_2sft --source huggingface
/usr/bin/python3: Error while finding module specification for 'scripts.downloader.fish_speech_1_2sft' (ModuleNotFoundError: No module named 'scripts.downloader')
Executing: python -m scripts.dl_cosyvoice_instruct --source huggingface
/usr/bin/python3: No module named scripts.dl_cosyvoice_instruct
Executing: python -m scripts.downloader.fire_red_tts --source huggingface
/usr/bin/python3: Error while finding module specification for 'scripts.downloader.fire_red_tts' (ModuleNotFoundError: No module named 'scripts.downloader')
Executing: python -m scripts.downloader.f5_tts --source huggingface
/usr/bin/python3: Error while finding module specification for 'scripts.downloader.f5_tts' (ModuleNotFoundError: No module named 'scripts.downloader')
Executing: python -m scripts.downloader.vocos_mel_24khz -

## 运行 WebUI

In [3]:
!nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Tue_Aug_15_22:02:13_PDT_2023
Cuda compilation tools, release 12.2, V12.2.140
Build cuda_12.2.r12.2/compiler.33191640_0


In [4]:
!nvidia-smi

Sun Dec 15 07:29:42 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05             Driver Version: 535.104.05   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|   0  Tesla T4                       Off | 00000000:00:04.0 Off |                    0 |
| N/A   35C    P8               9W /  70W |      0MiB / 15360MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                    

# 配置代理供外部访问

In [7]:
!wget https://bin.equinox.io/c/4VmDzA7iaHb/ngrok-stable-linux-amd64.zip
!unzip ngrok-stable-linux-amd64.zip
get_ipython().system_raw('./ngrok http 7860 &')

--2024-12-15 08:44:41--  https://bin.equinox.io/c/4VmDzA7iaHb/ngrok-stable-linux-amd64.zip
Resolving bin.equinox.io (bin.equinox.io)... 13.248.244.96, 99.83.220.108, 75.2.60.68, ...
Connecting to bin.equinox.io (bin.equinox.io)|13.248.244.96|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 13921656 (13M) [application/octet-stream]
Saving to: ‘ngrok-stable-linux-amd64.zip’


2024-12-15 08:44:42 (16.5 MB/s) - ‘ngrok-stable-linux-amd64.zip’ saved [13921656/13921656]

Archive:  ngrok-stable-linux-amd64.zip
  inflating: ngrok                   


# **启动WEB服务**

In [5]:
!python webui.py --share --language=zh-CN

python3: can't open file '/content/webui.py': [Errno 2] No such file or directory
