Librilight Preprocessed

This is a preprocessed librilight dataset with ASR using Whisper Large model and then aligned with montreal forced aligner.

Warning

This attempt of building a good large aligned dataset failed due to too low quality of whisper ASR for such task. Use libriheavy instead.

Dataset Structure

Structure is similar to original dataset. Each file is represented in three formats: .flac with audio, .txt with text, .TextGrid with alignment. Top level folders are speakers, next one is a session and then the files split into up to 30 seconds with rougtly 15 seconds on average.

Downloads

This dataset can easily be downloaded using my datasets tool using identifiers: librilight-processed, librilight-processed@medium and librilight-processed@large.

Or it can be downloaded directly from my server.

Reproduction

To reproduce the dataset you need to execute the following steps:

datasets sync # Download source datasets (depends on your network)
./prepare_cut.sh # Cut the audio files (fast)
./prepare_transcribe.sh # Transcribe the audio files using Whisper, this could take days and GPUs are needed
./prepare_align.sh # Align the transcribed files using montreal forced aligner, this could take days
./prepare_final.sh # Prepare the final datasets

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
scripts		scripts
.gitignore		.gitignore
README.md		README.md
datasets.yaml		datasets.yaml
prepare_align.sh		prepare_align.sh
prepare_cut.sh		prepare_cut.sh
prepare_final.sh		prepare_final.sh
prepare_transcribe.sh		prepare_transcribe.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Librilight Preprocessed

Dataset Structure

Downloads

Reproduction

License

About

Releases

Packages

Languages

ex3ndr/supervoice-librilight-preprocessed

Folders and files

Latest commit

History

Repository files navigation

Librilight Preprocessed

Dataset Structure

Downloads

Reproduction

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages