Release V1.11 · fishaudio/fish-diffusion

我们很高兴地宣布预训练模型现已可用, 这意味着您只需要 30 分钟的音频数据和 15 分钟的微调时间 (在 3090 上) 就可以模拟你想要的音色.
We are happy to announce that the pre-trained model is now available, which means you only need 30 minutes of audio data and 15 minutes to fine-tune it (on 3090).

我们建议您参考随附的配置进行微调. 它更改了学习率调度程序和保存检查点之间的步骤间隔.
We recommend you refer to the attached config for finetuning. It changed the lr scheduler and steps between saving checkpoints.

Model Info

Dataset Size: ~300 hours, ~600 singers (M4Singer, OpenSinger, OpenCpop, and In House Data)
Vocoder: NSF HifiGAN 44.1 khz (OpenVPI)
Feature Extractor: Chinese Hubert Soft with gate size 25
MD5: 9d88f1bbca34053919ee1ea8bd780a9b
Steps: 260k on a 4 x RTXA6000 server

本模型根据 CC-BY-NC-SA 4.0 license 发布, 下载前请仔细阅读.
This model is released under CC-BY-NC-SA 4.0 license, please read it before you download.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

V1.11