Skip to content

r-dh/dutch-vl-tts

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 

Repository files navigation

dutch-vl-tts

This dataset contains 15.000 audio fragments of a male Dutch Flemish voice, the sentences read are extracted from the Mozilla Common Voice project.

Dataset: Google Drive (1.5GB)

New: the dataset is now also available for preview on Kaggle

To use this dataset with Mozilla TTS, append the following fragment to TTS/tts/datasets/preprocess.py:

def rdh_flemish(root_path, meta_file):
    txt_file = os.path.join(root_path, meta_file)
    speaker_name = "rdh_flemish"
    items = []
    with open(txt_file, 'r', encoding="utf-8") as f:
        for line in f:
            cols = line.split("|")
            text = cols[1]
            wav_file = os.path.join(root_path, cols[0] + ".wav")
            items.append([text, wav_file, speaker_name])
    return items

Audio

Files in the dataset are 16-bit, 22050Hz downsampled from 44.1kHz, mono, wave.

The audio samples unfortunately may vary slightly over recording sessions.

Trained models

Models with their corresponding synthesised audio samples are provided in the links below.

Original dataset:

Other dataset:

Due to a severe lack of quality data (4.000 noise gated fragments) the second dataset hasn't been released. The first model was used for transfer learning, although this still proved to be insufficient.

About

Free Dutch voice dataset

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published