rmelnet

Experimental dump of R-MelNet related code and demo files

See the site to listen to samples https://kastnerkyle.github.io/rmelnet/ , alternatively see the instructions below.

Experimental code and model are released, but a directly runnable inference pipeline is still TODO - see raw_code/ for details

See samples/melnet_trunc_pt33 for samples from the R-MelNet pipeline. tts*.wav files represent the initial tts (generated via hts) that were used to extract the initial pronunciation / phonemization of the text. raw*.wav files are the output from the model with priming trimmed, and cut off based on the attention termination.

concat.wav contains the combination of all the raw files, using the command. It is useful for hearing variability across samples.

ffmpeg -f concat -safe 0 -i <( for f in $(ls */raw*.wav | sort -n -t "_" -k2); do echo "file '$(pwd)/$f'"; done ) output.wav

Baseline comparisons for fastspeech2 and portaspeech were generated from their huggingface spaces, at https://huggingface.co/facebook/fastspeech2-en-ljspeech and https://huggingface.co/spaces/NATSpeech/PortaSpeech respectively.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
raw_experimental_dump		raw_experimental_dump
samples		samples
site		site
README.md		README.md
index.html		index.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

rmelnet

About

Releases

Packages

Languages

kastnerkyle/rmelnet

Folders and files

Latest commit

History

Repository files navigation

rmelnet

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages