<a href="https://colab.research.google.com/github/frank25184/1000-AI-collection-tools/blob/main/colab.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 🗣️ Speech-AI-Forge Colab

👋本脚本基于 [Speech-AI-Forge](https://github.com/lenML/Speech-AI-Forge) 构建。如果此项目对你有帮助，欢迎到 github 为我们 star 支持！也欢迎提交 pr issues~

## 运行指南

1. 在菜单栏中选择 **代码执行程序**。
2. 点击 **全部运行**。

运行完成后，请在下方日志中找到如下信息：

```
Running on public URL: https://**.gradio.live
```

该链接即为您可以访问的公网地址。

> 注意：如果在安装包时提示需要重启，请选择 "否"。

## 环境部署

In [6]:
# 1. Clone the repository
!git clone https://github.com/lenML/Speech-AI-Forge

# 2. Change directory to the repository
%cd Speech-AI-Forge

# 3. Install ffmpeg and rubberband-cli
!apt-get update -y
!apt-get install -y ffmpeg rubberband-cli

# 4. Install PyTorch
!pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

# 5. Install dependencies
!pip install -r requirements.txt


fatal: destination path 'Speech-AI-Forge' already exists and is not an empty directory.
/content/Speech-AI-Forge
Hit:1 https://cloud.r-project.org/bin/linux/ubuntu jammy-cran40/ InRelease
Hit:2 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64  InRelease
Hit:3 http://archive.ubuntu.com/ubuntu jammy InRelease
Hit:4 http://archive.ubuntu.com/ubuntu jammy-updates InRelease
Get:5 http://security.ubuntu.com/ubuntu jammy-security InRelease [129 kB]
Hit:6 https://r2u.stat.illinois.edu/ubuntu jammy InRelease
Hit:7 http://archive.ubuntu.com/ubuntu jammy-backports InRelease
Hit:8 https://ppa.launchpadcontent.net/deadsnakes/ppa/ubuntu jammy InRelease
Hit:9 https://ppa.launchpadcontent.net/graphics-drivers/ppa/ubuntu jammy InRelease
Hit:10 https://ppa.launchpadcontent.net/ubuntugis/ppa/ubuntu jammy InRelease
Fetched 129 kB in 1s (110 kB/s)
Reading package lists... Done
W: Skipping acquire of configured file 'main/source/Sources' as repository 'https://r2u.stat.illinois.edu

## 下载模型

In [7]:
# @markdown ## 模型下载说明
# @markdown 大部分模型的大小接近 2GB，请确保有足够的存储空间和网络带宽。  <br/>
# @markdown **注意**：至少必须选择一个 TTS 模型。如果没有选择，将默认下载 ChatTTS。

# @markdown ### TTS 模型
# @markdown ChatTTS: [GitHub](https://github.com/2noise/ChatTTS) - A generative speech model for daily dialogue.
download_chattts = True  # @param {"type":"boolean", "placeholder":"下载 ChatTTS 模型"}
# @markdown F5-TTS: [GitHub](https://github.com/SWivid/F5-TTS) - A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching
download_f5_tts = False  # @param {"type":"boolean"}
# @markdown CosyVoice: [GitHub](https://github.com/FunAudioLLM/CosyVoice) - Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
download_cosyvoice = False  # @param {"type":"boolean"}
# @markdown FireRedTTS: [GitHub](https://github.com/FireRedTeam/FireRedTTS) - An Open-Sourced LLM-empowered Foundation TTS System
download_fire_red_tts = False  # @param {"type":"boolean"}
# @markdown FishSpeech: [GitHub](https://github.com/fishaudio/fish-speech) - Brand new TTS solution
download_fish_speech = False  # @param {"type":"boolean"}

# @markdown ### ASR 模型
# @markdown Whisper: [GitHub](https://github.com/openai/whisper) - Robust Speech Recognition via Large-Scale Weak Supervision
download_whisper = True  # @param {"type":"boolean"}

# @markdown ### Clone Voice 模型
# @markdown OpenVoice: [GitHub](https://github.com/myshell-ai/OpenVoice) - Instant voice cloning by MIT and MyShell.
download_open_voice = True  # @param {"type":"boolean"}

# @markdown ### 增强模型
# @markdown resemble-enhance: [GitHub](https://github.com/resemble-ai/resemble-enhance) - AI powered speech denoising and enhancement
download_enhancer = True  # @param {"type":"boolean"}

def download_model(command):
    print(f"Executing: {command}")
    !{command}

# 检查是否至少选择了一个 TTS 模型
if not any([download_chattts, download_fish_speech, download_cosyvoice, download_fire_red_tts, download_f5_tts]):
    print("未选择任何 TTS 模型，默认下载 ChatTTS...")
    download_chattts = True

# TTS 模型下载
if download_chattts:
    download_model("python -m scripts.dl_chattts --source huggingface")

if download_fish_speech:
    download_model("python -m scripts.downloader.fish_speech_1_2sft --source huggingface")

if download_cosyvoice:
    download_model("python -m scripts.dl_cosyvoice_instruct --source huggingface")

if download_fire_red_tts:
    download_model("python -m scripts.downloader.fire_red_tts --source huggingface")

if download_f5_tts:
    download_model("python -m scripts.downloader.f5_tts --source huggingface")
    download_model("python -m scripts.downloader.vocos_mel_24khz --source huggingface")

# ASR 模型下载
if download_whisper:
    download_model("python -m scripts.downloader.faster_whisper --source huggingface")

# Clone Voice 模型下载
if download_open_voice:
    download_model("python -m scripts.downloader.open_voice --source huggingface")

# 增强模型下载
if download_enhancer:
    download_model("python -m scripts.dl_enhance --source huggingface")

print("所有选定的模型已下载完成！")


Executing: python -m scripts.dl_chattts --source huggingface
Model ChatTTS already exists.
Executing: python -m scripts.downloader.faster_whisper --source huggingface
Model faster-whisper-large-v3 already exists.
Executing: python -m scripts.dl_enhance --source huggingface
Model resemble-enhance already exists.
所有选定的模型已下载完成！


## 运行 WebUI

In [None]:
!nvcc --version

In [9]:
!nvidia-smi

Tue Nov 12 13:19:54 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05             Driver Version: 535.104.05   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|   0  Tesla T4                       Off | 00000000:00:04.0 Off |                    0 |
| N/A   49C    P8              10W /  70W |      0MiB / 15360MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                    

In [None]:
!python webui.py --share --language=zh-CN

2024-11-12 13:19:57,110 - numexpr.utils - INFO - NumExpr defaulting to 2 threads.
2024-11-12 13:20:04,383 - root - INFO - New registry table added: preprocessor_classes
Key Conformer already exists in model_classes, re-register
2024-11-12 13:20:05,492 - root - INFO - New registry table added: adaptor_classes
Key Linear already exists in adaptor_classes, re-register
Key TransformerDecoder already exists in decoder_classes, re-register
Key LightweightConvolutionTransformerDecoder already exists in decoder_classes, re-register
Key LightweightConvolution2DTransformerDecoder already exists in decoder_classes, re-register
Key DynamicConvolutionTransformerDecoder already exists in decoder_classes, re-register
Key DynamicConvolution2DTransformerDecoder already exists in decoder_classes, re-register
2024-11-12 13:20:06,651 - root - INFO - New registry table added: lid_predictor_classes
2024-11-12 13:20:06,838 - datasets - INFO - PyTorch version 2.5.0+cu121 available.
2024-11-12 13:20:06,840 - d