seed-tts-eval

💥 This repository contains the test set of speech generation proposed in our work seed-TTS, as well as the scripts for metrics calculation. For security reasons, the source code and model weights of Seed-TTS will not be released. Welcome to try the speech generation feature in the products of ByteDance. 💥

To evaluate the zero-shot speech generation ability of our model, we propose an out-of-domain objective evaluation test set. This test set consists of samples extracted from English (EN) and Mandarin (ZH) public corpora that are used to measure the model's performance on various objective metrics. Specifically, we employ 1,000 samples from the Common Voice dataset and 2,000 samples from the DiDiSpeech-2 dataset.

Requirements

To install all dependencies, run

pip3 install -r requirements.txt

Metrics

The word error rate (WER) and speaker similarity (SIM) metrics are adopted for objective evaluation.

For WER, we employ Whisper-large-v3 and Paraformer-zh as the automatic speech recognition (ASR) engines for English and Mandarin, respectively.
For SIM, we use WavLM-large fine-tuned on the speaker verification task (model link) to obtain speaker embeddings used to calculate the cosine similarity of speech samples of each test utterance against reference clips.

Dataset

You can download the test set for all tasks from this link. The test set is mainly organized using the method of meta file. The meaning of each line in the meta file: filename | the text of the prompt | the audio of the prompt | the text to be synthesized | the ground truth counterpart corresponding to the text to be synthesized （if exists）. For different tasks, we adopt different meta files:

Zero-shot text-to-speech (TTS):
- EN: seed-tts-eval/en/meta.lst
- ZH: seed-tts-eval/zh/meta.lst
- ZH (hard case): seed-tts-eval/zh/hardcase.lst
Zero-shot voice conversion (VC):
- EN: seed-tts-eval/en/non_para_reconstruct_meta.lst
- ZH: seed-tts-eval/zh/non_para_reconstruct_meta.lst

Code

We also release the evaluation code for both metrics:

# WER
bash cal_wer.sh {the path of the meta file} {the directory of synthesized audio} {language: zh or en}
# SIM
bash cal_sim.sh {the path of the meta file} {the directory of synthesized audio} {path/wavlm_large_finetune.pth}

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
thirdparty/UniSpeech		thirdparty/UniSpeech
README.md		README.md
average_wer.py		average_wer.py
cal_sim.sh		cal_sim.sh
cal_wer.sh		cal_wer.sh
get_wav_res_ref_text.py		get_wav_res_ref_text.py
prepare_ckpt.py		prepare_ckpt.py
requirements.txt		requirements.txt
run_wer.py		run_wer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

seed-tts-eval

Requirements

Metrics

Dataset

Code

About

Releases

Packages

Languages

jzq2000/seed-tts-eval

Folders and files

Latest commit

History

Repository files navigation

seed-tts-eval

Requirements

Metrics

Dataset

Code

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages