-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Talromur2 recipe #4680
Merged
Merged
Add Talromur2 recipe #4680
Changes from 37 commits
Commits
Show all changes
43 commits
Select commit
Hold shift + click to select a range
07d151c
add talromur to db.sh
G-Thor 8152202
Merge branch 'master' of https://github.com/cadia-lvl/espnet
G-Thor 2a1a407
Integrate icelandic g2p
G-Thor 2f2ed4b
Add talromur recipe
G-Thor 299e061
Add progress bar to phonemization
G-Thor 6348585
Remove unused variables from talromur recipe
G-Thor 16f47f7
Merge branch 'master' of https://github.com/espnet/espnet into pull_u…
G-Thor b467140
reformatted with black
G-Thor 7752729
Set cluster settings to default, add name to TODO
G-Thor 49a284e
Fix python formatting
G-Thor 87b7b25
Add data_download.sh for talromur recipe.
G-Thor 67e9836
Added multi-speaker VITS training recipe
G-Thor 6320f0b
Updated ice-g2p integration to reflect recent changes
G-Thor 759cec0
add talromur to db.sh
G-Thor d7a3946
Integrate icelandic g2p
G-Thor 2f27743
Add talromur recipe
G-Thor 5009416
Add progress bar to phonemization
G-Thor c6b4db3
Remove unused variables from talromur recipe
G-Thor 2f2c8db
reformatted with black
G-Thor bf449bd
Set cluster settings to default, add name to TODO
G-Thor 3aae2f7
Fix python formatting
G-Thor bcead94
Add data_download.sh for talromur recipe.
G-Thor ab374fa
Added multi-speaker VITS training recipe
G-Thor a3658ad
Updated ice-g2p integration to reflect recent changes
G-Thor cd7078d
Fix merge conflicts in db.sh
G-Thor 512b857
refactor ice-g2p integration
G-Thor 49d846a
initial commit. Not complete yet
G-Thor bbb8111
add conf files and implement data scripts
G-Thor c61baeb
Implement talromur2 recipe data preprocessing scripts
G-Thor 1ccd7e5
Merge branch 'add_talromur2' of github.com:cadia-lvl/espnet into talr…
G-Thor 540433c
Revert changes irrelevant to recipe
G-Thor 8b1f890
Remove unused conf files
G-Thor d22126c
Write README
G-Thor 9989814
Add Talromur 2 entry in egs2/README.md
G-Thor 0df138e
Fix linter error E501
G-Thor f7d6f33
Merge branch 'master' into talromur2
G-Thor 0649fee
apply black to new script
G-Thor 161ab4c
remove setup.sh
G-Thor 163e887
Apply changes from PR review
G-Thor 6a62e9b
Update model download URL to correct version
G-Thor 500ac0d
Make ice-g2p installer use official PyPI version
G-Thor 37cb63b
Merge branch 'master' into talromur2
G-Thor e52cfd8
Fix shellcheck errors and warnings
G-Thor File filter
Filter by extension
Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,51 @@ | ||
# Talrómur 2 recipe | ||
|
||
This is a recipe for the [Talrómur 2 corpus](https://repository.clarin.is/repository/xmlui/handle/20.500.12537/167), which is an Icelandic speech corpus intended for multi-speaker TTS development, containing studio recordings of speech from 40 Icelandic speakers, 20 male and 20 female. | ||
|
||
## Corpus | ||
The corpus contains 56,225 studio-recorded single-sentence audio clips.Each speaker in the corpus contributes 929 and 1879 clips. | ||
|
||
| Male voices | Female voices | | ||
|---|---| | ||
| s124, s176, s178, s181, s188 | s146, s180, s186, s208, s209 | | ||
| s206, s220, s225, s234, s235 | s214, s215, s221, s264, s268 | | ||
| s157, s162, s216, s222, s223 | s169, s185, s187, s200, s226 | | ||
| s231, s236, s240, s250, s273 | s228, s247, s251, s256, s258 | | ||
|
||
Since this corpus is intended for multi-speaker and speaker-adaptive tts no attempt is made to create single-speaker TTS models out of the individual voices in the corpus. | ||
|
||
A more detailed description of the corpus may be found in its [README file (download link)](https://repository.clarin.is/repository/xmlui/bitstream/handle/20.500.12537/167/README.md) and in [the official repository](https://repository.clarin.is/repository/xmlui/handle/20.500.12537/167) | ||
|
||
## Dependencies | ||
This recipe relies on the [Ice-G2P](https://github.com/grammatek/ice-g2p) package for g2p transcriptions. This can be done by running the installation script `installers/install_ice_g2p.sh` from the tools directory. e.g. from this directory you would do | ||
``` | ||
cd ../../../tools | ||
installers/install_ice_g2p.sh | ||
``` | ||
|
||
## Usage | ||
The recipe provides 2 training scripts: train_multi_speaker_fastspeech2.sh and train_multi_speaker_tacotron2.sh. In order to train a FastSpeech 2 model with the current implementation, you need to have a teacher model for phoneme durations. Therefore for a given speaker, you need to have already trained a Tacotron model in order to train a Fastspeech model. | ||
|
||
First download the data by running the following: | ||
``` | ||
. ./db.sh | ||
local/data_download.sh ${TALROMUR2} | ||
``` | ||
Now, to train a Tacotron 2 model, simply run `./train_multi_speaker_tacotron2.sh` | ||
Once a Tacotron model has been trained, you can run `./train_fastspeech2.sh` to obtain a FastSpeech 2 model. | ||
|
||
|
||
--- | ||
## Pretrained models | ||
Training outputs are present on Huggingface for each speaker: | ||
|
||
A fully trained x-vector based Tacotron2 model has been uploaded to HuggingFace: [Link](https://huggingface.co/espnet/talromur2_xvector_tacotron2). | ||
|
||
This Tacotron2 model was trained with slightly modified parameters for the ice-g2p Transcriber object, defined within `IsG2P` in `espnet2/text/phoneme_tokenizer.py`. | ||
|
||
`word_sep=".", syllab_symbol=""` was used instead of the default `word_sep=",", syllab_symbol="."`. When using the model for inference, these nonstandard parameters must be used when producing the phonemized inputs. | ||
|
||
--- | ||
|
||
## Acknowledgments | ||
This project was funded by the Language Technology Programme for Icelandic 2019-2023. The programme, which is managed and coordinated by Almannarómur, is funded by the Icelandic Ministry of Education, Science and Culture. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,110 @@ | ||
# ====== About run.pl, queue.pl, slurm.pl, and ssh.pl ====== | ||
# Usage: <cmd>.pl [options] JOB=1:<nj> <log> <command...> | ||
# e.g. | ||
# run.pl --mem 4G JOB=1:10 echo.JOB.log echo JOB | ||
# | ||
# Options: | ||
# --time <time>: Limit the maximum time to execute. | ||
# --mem <mem>: Limit the maximum memory usage. | ||
# -–max-jobs-run <njob>: Limit the number parallel jobs. This is ignored for non-array jobs. | ||
# --num-threads <ngpu>: Specify the number of CPU core. | ||
# --gpu <ngpu>: Specify the number of GPU devices. | ||
# --config: Change the configuration file from default. | ||
# | ||
# "JOB=1:10" is used for "array jobs" and it can control the number of parallel jobs. | ||
# The left string of "=", i.e. "JOB", is replaced by <N>(Nth job) in the command and the log file name, | ||
# e.g. "echo JOB" is changed to "echo 3" for the 3rd job and "echo 8" for 8th job respectively. | ||
# Note that the number must start with a positive number, so you can't use "JOB=0:10" for example. | ||
# | ||
# run.pl, queue.pl, slurm.pl, and ssh.pl have unified interface, not depending on its backend. | ||
# These options are mapping to specific options for each backend and | ||
# it is configured by "conf/queue.conf" and "conf/slurm.conf" by default. | ||
# If jobs failed, your configuration might be wrong for your environment. | ||
# | ||
# | ||
# The official documentation for run.pl, queue.pl, slurm.pl, and ssh.pl: | ||
# "Parallelization in Kaldi": http://kaldi-asr.org/doc/queue.html | ||
# =========================================================~ | ||
|
||
|
||
# Select the backend used by run.sh from "local", "stdout", "sge", "slurm", or "ssh" | ||
cmd_backend='local' | ||
|
||
# Local machine, without any Job scheduling system | ||
if [ "${cmd_backend}" = local ]; then | ||
|
||
# The other usage | ||
export train_cmd="run.pl" | ||
# Used for "*_train.py": "--gpu" is appended optionally by run.sh | ||
export cuda_cmd="run.pl" | ||
# Used for "*_recog.py" | ||
export decode_cmd="run.pl" | ||
|
||
# Local machine logging to stdout and log file, without any Job scheduling system | ||
elif [ "${cmd_backend}" = stdout ]; then | ||
|
||
# The other usage | ||
export train_cmd="stdout.pl" | ||
# Used for "*_train.py": "--gpu" is appended optionally by run.sh | ||
export cuda_cmd="stdout.pl" | ||
# Used for "*_recog.py" | ||
export decode_cmd="stdout.pl" | ||
|
||
|
||
# "qsub" (Sun Grid Engine, or derivation of it) | ||
elif [ "${cmd_backend}" = sge ]; then | ||
# The default setting is written in conf/queue.conf. | ||
# You must change "-q g.q" for the "queue" for your environment. | ||
# To know the "queue" names, type "qhost -q" | ||
# Note that to use "--gpu *", you have to setup "complex_value" for the system scheduler. | ||
|
||
export train_cmd="queue.pl" | ||
export cuda_cmd="queue.pl" | ||
export decode_cmd="queue.pl" | ||
|
||
|
||
# "qsub" (Torque/PBS.) | ||
elif [ "${cmd_backend}" = pbs ]; then | ||
# The default setting is written in conf/pbs.conf. | ||
|
||
export train_cmd="pbs.pl" | ||
export cuda_cmd="pbs.pl" | ||
export decode_cmd="pbs.pl" | ||
|
||
|
||
# "sbatch" (Slurm) | ||
elif [ "${cmd_backend}" = slurm ]; then | ||
# The default setting is written in conf/slurm.conf. | ||
# You must change "-p cpu" and "-p gpu" for the "partition" for your environment. | ||
# To know the "partion" names, type "sinfo". | ||
# You can use "--gpu * " by default for slurm and it is interpreted as "--gres gpu:*" | ||
# The devices are allocated exclusively using "${CUDA_VISIBLE_DEVICES}". | ||
|
||
export train_cmd="slurm.pl" | ||
export cuda_cmd="slurm.pl" | ||
export decode_cmd="slurm.pl" | ||
|
||
elif [ "${cmd_backend}" = ssh ]; then | ||
# You have to create ".queue/machines" to specify the host to execute jobs. | ||
# e.g. .queue/machines | ||
# host1 | ||
# host2 | ||
# host3 | ||
# Assuming you can login them without any password, i.e. You have to set ssh keys. | ||
|
||
export train_cmd="ssh.pl" | ||
export cuda_cmd="ssh.pl" | ||
export decode_cmd="ssh.pl" | ||
|
||
# This is an example of specifying several unique options in the JHU CLSP cluster setup. | ||
# Users can modify/add their own command options according to their cluster environments. | ||
elif [ "${cmd_backend}" = jhu ]; then | ||
|
||
export train_cmd="queue.pl --mem 2G" | ||
export cuda_cmd="queue-freegpu.pl --mem 2G --gpu 1 --config conf/queue.conf" | ||
export decode_cmd="queue.pl --mem 4G" | ||
|
||
else | ||
echo "$0: Error: Unknown cmd_backend=${cmd_backend}" 1>&2 | ||
return 1 | ||
fi |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
# This configuration is the basic decoding setting for Tacotron 2. | ||
# It can be also applied to Transformer. If you met some problems | ||
# such as deletions or repetitions, it is worthwhile to try | ||
# `use_att_constraint: true` to make the generation more stable. | ||
# Note that attention constraint is not supported in Transformer. | ||
|
||
########################################################## | ||
# DECODING SETTING # | ||
########################################################## | ||
threshold: 0.5 # threshold to stop the generation | ||
maxlenratio: 10.0 # maximum length of generated samples = input length * maxlenratio | ||
minlenratio: 0.0 # minimum length of generated samples = input length * minlenratio | ||
use_att_constraint: false # Whether to use attention constraint, which is introduced in Deep Voice 3 | ||
backward_window: 1 # Backward window size in the attention constraint | ||
forward_window: 3 # Forward window size in the attention constraint |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
--sample-frequency=16000 | ||
--frame-length=25 # the default is 25 | ||
--low-freq=20 # the default. | ||
--high-freq=7600 # the default is zero meaning use the Nyquist (8k in this case). | ||
--num-mel-bins=30 | ||
--num-ceps=30 | ||
--snip-edges=false |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
# Default configuration | ||
command qsub -V -v PATH -S /bin/bash | ||
option name=* -N $0 | ||
option mem=* -l mem=$0 | ||
option mem=0 # Do not add anything to qsub_opts | ||
option num_threads=* -l ncpus=$0 | ||
option num_threads=1 # Do not add anything to qsub_opts | ||
option num_nodes=* -l nodes=$0:ppn=1 | ||
default gpu=0 | ||
option gpu=0 | ||
option gpu=* -l ngpus=$0 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
# Default configuration | ||
command qsub -v PATH -cwd -S /bin/bash -j y -l arch=*64* | ||
option name=* -N $0 | ||
option mem=* -l mem_free=$0,ram_free=$0 | ||
option mem=0 # Do not add anything to qsub_opts | ||
option num_threads=* -pe smp $0 | ||
option num_threads=1 # Do not add anything to qsub_opts | ||
option max_jobs_run=* -tc $0 | ||
option num_nodes=* -pe mpi $0 # You must set this PE as allocation_rule=1 | ||
default gpu=0 | ||
option gpu=0 | ||
option gpu=* -l gpu=$0 -q g.q |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
# Default configuration | ||
command sbatch --export=PATH | ||
option name=* --job-name $0 | ||
option time=* --time $0 | ||
option mem=* --mem-per-cpu $0 | ||
option mem=0 | ||
option num_threads=* --cpus-per-task $0 | ||
option num_threads=1 --cpus-per-task 1 | ||
option num_nodes=* --nodes $0 | ||
default gpu=0 | ||
option gpu=0 -p cpu | ||
option gpu=* -p gpu --gres=gpu:$0 -c $0 # Recommend allocating more CPU than, or equal to the number of GPU | ||
# note: the --max-jobs-run option is supported as a special case | ||
# by slurm.pl and you don't have to handle it in the config file. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
# This configuration is the decoding setting for FastSpeech or FastSpeech2. | ||
|
||
########################################################## | ||
# DECODING SETTING # | ||
########################################################## | ||
speed_control_alpha: 1 # alpha to control the speed of generated speech | ||
# 1 < alpha makes slower and 1 > alpha makes faster | ||
use_teacher_forcing: false # whether to use teacher forcing | ||
# if true, we use groundtruth of durations | ||
# (+ pitch & energy for FastSpeech2) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
# This configuration is the basic decoding setting for Tacotron 2. | ||
# It can be also applied to Transformer. If you met some problems | ||
# such as deletions or repetitions, it is worthwhile to try | ||
# `use_att_constraint: true` to make the generation more stable. | ||
# Note that attention constraint is not supported in Transformer. | ||
|
||
########################################################## | ||
# DECODING SETTING # | ||
########################################################## | ||
threshold: 0.5 # threshold to stop the generation | ||
maxlenratio: 10.0 # maximum length of generated samples = input length * maxlenratio | ||
minlenratio: 0.0 # minimum length of generated samples = input length * minlenratio | ||
use_att_constraint: false # Whether to use attention constraint, which is introduced in Deep Voice 3 | ||
backward_window: 1 # Backward window size in the attention constraint | ||
forward_window: 3 # Forward window size in the attention constraint |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
# This configuration is the decoding setting for VITS. | ||
|
||
########################################################## | ||
# DECODING SETTING # | ||
########################################################## | ||
noise_scale: 0.667 # noise scale parameter for the flow in VITS. | ||
noise_scale_dur: 0.8 # noise scale parameter for the stochastic duration predictor in VITS. | ||
speed_control_alpha: 1 # alpha to control the speed of generated speech. | ||
# 1 < alpha makes slower and 1 > alpha makes faster. | ||
use_teacher_forcing: false # whether to use teacher forcing. |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add line break