Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate recipe for nit_song070 from Muskit #5251

Merged
merged 35 commits into from
Jul 21, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
4ddc1f5
Add ntg_song receipe
wwwbxy123 Jun 7, 2023
b26743b
Update db.sh
wwwbxy123 Jun 7, 2023
42f4c82
Change the template back
wwwbxy123 Jun 7, 2023
c0c877a
Update format_midi_scp.py
wwwbxy123 Jun 7, 2023
e61866c
Delete format_midi_scp.py
wwwbxy123 Jun 7, 2023
64465ca
Add format_midi_scp.py in nit_song recepe
wwwbxy123 Jun 7, 2023
1b1e17e
Resume format_wv_scp
wwwbxy123 Jun 7, 2023
b5d8838
Modify import of format_midi_scp
wwwbxy123 Jun 7, 2023
7176f93
Recover db.sh symlink
wwwbxy123 Jun 7, 2023
04de29e
Add NIT_SONG070 in db.sh
wwwbxy123 Jun 7, 2023
86b9dac
Bug fix
wwwbxy123 Jun 7, 2023
fd0856d
Remove setup.sh
wwwbxy123 Jun 7, 2023
a45e8a3
Updates
wwwbxy123 Jun 7, 2023
b2bda1b
Update nit_song070
wwwbxy123 Jun 13, 2023
d799257
Make score.scp segment and TODO
wwwbxy123 Jun 13, 2023
45462ea
Copy score to each segment (trial, failed)
wwwbxy123 Jun 23, 2023
0f52a91
score.scp segment (copy only) fix
wwwbxy123 Jun 23, 2023
4c2d0a5
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jun 23, 2023
c060347
Remove fmin for SC2034 warning
wwwbxy123 Jun 29, 2023
8a067bb
Make segments in data_prep.py
wwwbxy123 Jun 30, 2023
71b9b3b
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jun 30, 2023
e31f57c
Remove commented code and unused module
wwwbxy123 Jun 30, 2023
d77124d
Limit lines & remove unused create_midi method
wwwbxy123 Jun 30, 2023
425118b
Remove path for nit_song070
wwwbxy123 Jun 30, 2023
3f7e749
Merge branch 'master' into nit_song
wwwbxy123 Jun 30, 2023
7fe7045
Update run.sh & remove unnecessary configs
wwwbxy123 Jul 7, 2023
7b01b1c
Remove unnecessary change of format_score_scp.py
wwwbxy123 Jul 7, 2023
bb12bed
Update egs2/nit_song070/svs1/local/data.sh
wwwbxy123 Jul 8, 2023
44d05f0
Update egs2/nit_song070/svs1/local/data.sh
wwwbxy123 Jul 8, 2023
4bf1eb5
Merge branch 'master' into nit_song
wwwbxy123 Jul 8, 2023
32609d4
Add nit_song070 to egs2/README.md
wwwbxy123 Jul 13, 2023
5f9b5ad
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jul 13, 2023
cc74a7a
Make the wav file name be consistant with tag name
wwwbxy123 Jul 14, 2023
3c4aefc
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jul 14, 2023
641d6ca
Merge branch 'master' into nit_song
ftshijt Jul 20, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
1 change: 1 addition & 0 deletions egs2/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -108,6 +108,7 @@ See: https://espnet.github.io/espnet/espnet2_tutorial.html#recipes-using-espnet2
| mucs21_subtask2 | MUltilingual and Code-Switching ASR Challenges for Low Resource Indian Languages | ASR | 2 codeswitching data | https://navana-tech.github.io/MUCS2021/challenge_details.html | |
| must_c | https://ict.fbk.eu/must-c/ | ASR/MT/ST | ENG->14langs | https://ict.fbk.eu/must-c/ | |
| must_c_v2 | https://ict.fbk.eu/must-c/ | ASR/MT/ST | ENG->DEU | https://ict.fbk.eu/must-c/ | |
| nit_song070 | The NITech Japanese speech database | SVS | JPN | http://hts.sp.nitech.ac.jp/archives/2.3/HTS-demo_NIT-SONG070-F001.tar.bz2
| nsc | National Speech Corpus | ASR | ENG-SG | https://www.imda.gov.sg/programme-listing/digital-services-lab/national-speech-corpus | |
| ofuton_p_utagoe_db | Ofuton_p_utagoe Singing voice synthesis corpus | SVS | JPN | https://sites.google.com/view/oftn-utagoedb/%E3%83%9B%E3%83%BC%E3%83%A0 | |
| oniku_kurumi_utagoe_db | Oniku Singing voice synthesis corpus | SVS | JPN | http://onikuru.info/db-download/ | |
Expand Down
1 change: 1 addition & 0 deletions egs2/TEMPLATE/asr1/db.sh
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,7 @@ LJSPEECH=downloads
MUSAN=
wwwbxy123 marked this conversation as resolved.
Show resolved Hide resolved
MUST_C=downloads
NSC=
NIT_SONG070=
JMD=downloads
JSSS=downloads
JSUT=downloads
Expand Down
110 changes: 110 additions & 0 deletions egs2/nit_song070/svs1/cmd.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,110 @@
# ====== About run.pl, queue.pl, slurm.pl, and ssh.pl ======
# Usage: <cmd>.pl [options] JOB=1:<nj> <log> <command...>
# e.g.
# run.pl --mem 4G JOB=1:10 echo.JOB.log echo JOB
#
# Options:
# --time <time>: Limit the maximum time to execute.
# --mem <mem>: Limit the maximum memory usage.
# -–max-jobs-run <njob>: Limit the number parallel jobs. This is ignored for non-array jobs.
# --num-threads <ngpu>: Specify the number of CPU core.
# --gpu <ngpu>: Specify the number of GPU devices.
# --config: Change the configuration file from default.
#
# "JOB=1:10" is used for "array jobs" and it can control the number of parallel jobs.
# The left string of "=", i.e. "JOB", is replaced by <N>(Nth job) in the command and the log file name,
# e.g. "echo JOB" is changed to "echo 3" for the 3rd job and "echo 8" for 8th job respectively.
# Note that the number must start with a positive number, so you can't use "JOB=0:10" for example.
#
# run.pl, queue.pl, slurm.pl, and ssh.pl have unified interface, not depending on its backend.
# These options are mapping to specific options for each backend and
# it is configured by "conf/queue.conf" and "conf/slurm.conf" by default.
# If jobs failed, your configuration might be wrong for your environment.
#
#
# The official documentation for run.pl, queue.pl, slurm.pl, and ssh.pl:
# "Parallelization in Kaldi": http://kaldi-asr.org/doc/queue.html
# =========================================================~


# Select the backend used by run.sh from "local", "stdout", "sge", "slurm", or "ssh"
cmd_backend='local'

# Local machine, without any Job scheduling system
if [ "${cmd_backend}" = local ]; then

# The other usage
export train_cmd="run.pl"
# Used for "*_train.py": "--gpu" is appended optionally by run.sh
export cuda_cmd="run.pl"
# Used for "*_recog.py"
export decode_cmd="run.pl"

# Local machine logging to stdout and log file, without any Job scheduling system
elif [ "${cmd_backend}" = stdout ]; then

# The other usage
export train_cmd="stdout.pl"
# Used for "*_train.py": "--gpu" is appended optionally by run.sh
export cuda_cmd="stdout.pl"
# Used for "*_recog.py"
export decode_cmd="stdout.pl"


# "qsub" (Sun Grid Engine, or derivation of it)
elif [ "${cmd_backend}" = sge ]; then
# The default setting is written in conf/queue.conf.
# You must change "-q g.q" for the "queue" for your environment.
# To know the "queue" names, type "qhost -q"
# Note that to use "--gpu *", you have to setup "complex_value" for the system scheduler.

export train_cmd="queue.pl"
export cuda_cmd="queue.pl"
export decode_cmd="queue.pl"


# "qsub" (Torque/PBS.)
elif [ "${cmd_backend}" = pbs ]; then
# The default setting is written in conf/pbs.conf.

export train_cmd="pbs.pl"
export cuda_cmd="pbs.pl"
export decode_cmd="pbs.pl"


# "sbatch" (Slurm)
elif [ "${cmd_backend}" = slurm ]; then
# The default setting is written in conf/slurm.conf.
# You must change "-p cpu" and "-p gpu" for the "partition" for your environment.
# To know the "partion" names, type "sinfo".
# You can use "--gpu * " by default for slurm and it is interpreted as "--gres gpu:*"
# The devices are allocated exclusively using "${CUDA_VISIBLE_DEVICES}".

export train_cmd="slurm.pl"
export cuda_cmd="slurm.pl"
export decode_cmd="slurm.pl"

elif [ "${cmd_backend}" = ssh ]; then
# You have to create ".queue/machines" to specify the host to execute jobs.
# e.g. .queue/machines
# host1
# host2
# host3
# Assuming you can login them without any password, i.e. You have to set ssh keys.

export train_cmd="ssh.pl"
export cuda_cmd="ssh.pl"
export decode_cmd="ssh.pl"

# This is an example of specifying several unique options in the JHU CLSP cluster setup.
# Users can modify/add their own command options according to their cluster environments.
elif [ "${cmd_backend}" = jhu ]; then

export train_cmd="queue.pl --mem 2G"
export cuda_cmd="queue-freegpu.pl --mem 2G --gpu 1 --config conf/queue.conf"
export decode_cmd="queue.pl --mem 4G"

else
echo "$0: Error: Unknown cmd_backend=${cmd_backend}" 1>&2
return 1
fi
8 changes: 8 additions & 0 deletions egs2/nit_song070/svs1/conf/decode.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# This configuration is the decoding setting for FastSpeech or FastSpeech2.

##########################################################
# DECODING SETTING #
##########################################################
use_teacher_forcing: false # whether to use teacher forcing
# if true, we use groundtruth of durations
# (+ pitch & energy for FastSpeech2)
11 changes: 11 additions & 0 deletions egs2/nit_song070/svs1/conf/pbs.conf
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# Default configuration
command qsub -V -v PATH -S /bin/bash
option name=* -N $0
option mem=* -l mem=$0
option mem=0 # Do not add anything to qsub_opts
option num_threads=* -l ncpus=$0
option num_threads=1 # Do not add anything to qsub_opts
option num_nodes=* -l nodes=$0:ppn=1
default gpu=0
option gpu=0
option gpu=* -l ngpus=$0
12 changes: 12 additions & 0 deletions egs2/nit_song070/svs1/conf/queue.conf
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# Default configuration
command qsub -v PATH -cwd -S /bin/bash -j y -l arch=*64*
option name=* -N $0
option mem=* -l mem_free=$0,ram_free=$0
option mem=0 # Do not add anything to qsub_opts
option num_threads=* -pe smp $0
option num_threads=1 # Do not add anything to qsub_opts
option max_jobs_run=* -tc $0
option num_nodes=* -pe mpi $0 # You must set this PE as allocation_rule=1
default gpu=0
option gpu=0
option gpu=* -l gpu=$0 -q g.q
14 changes: 14 additions & 0 deletions egs2/nit_song070/svs1/conf/slurm.conf
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# Default configuration
command sbatch --export=PATH
option name=* --job-name $0
option time=* --time $0
option mem=* --mem-per-cpu $0
option mem=0
option num_threads=* --cpus-per-task $0
option num_threads=1 --cpus-per-task 1
option num_nodes=* --nodes $0
default gpu=0
option gpu=0 -p cpu
option gpu=* -p gpu --gres=gpu:$0 -c $0 # Recommend allocating more CPU than, or equal to the number of GPU
# note: the --max-jobs-run option is supported as a special case
# by slurm.pl and you don't have to handle it in the config file.
1 change: 1 addition & 0 deletions egs2/nit_song070/svs1/conf/train.yaml
69 changes: 69 additions & 0 deletions egs2/nit_song070/svs1/conf/tuning/train_naive_rnn.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@


##########################################################
# SVS MODEL SETTING #
##########################################################
svs: naive_rnn # model architecture
svs_conf: # keyword arguments for the selected model
midi_dim: 129 # midi dimension (note number + silence)
embed_dim: 512 # char or phn embedding dimension
eprenet_conv_layers: 0 # prenet (from bytesing) conv layers
eprenet_conv_chans: 256 # prenet (from bytesing) conv channels numbers
eprenet_conv_filts: 3 # prenet (from bytesing) conv filters size
elayers: 3 # number of lstm layers in encoder
eunits: 512 # number of lstm units
ebidirectional: True # if bidirectional in encoder
midi_embed_integration_type: add # how to integrate midi information
dlayers: 5 # number of lstm layers in decoder
dunits: 1024 # number of lstm units in decoder
dbidirectional: True # if bidirectional in decoder
postnet_layers: 5 # number of layers in postnet
postnet_chans: 512 # number of channels in postnet
postnet_filts: 5 # filter size of postnet layer
use_batch_norm: true # whether to use batch normalization in postnet
reduction_factor: 1 # reduction factor
eprenet_dropout_rate: 0.2 # prenet dropout rate
edropout_rate: 0.1 # encoder dropout rate
ddropout_rate: 0.1 # decoder dropout rate
postnet_dropout_rate: 0.5 # postnet dropout_rate
init_type: pytorch # parameter initialization
use_masking: true # whether to apply masking for padded part in loss calculation
loss_type: L1


##########################################################
# OPTIMIZER SETTING #
##########################################################
optim: adam # optimizer type
optim_conf: # keyword arguments for selected optimizer
lr: 1.0e-03 # learning rate
eps: 1.0e-06 # epsilon
weight_decay: 0.0 # weight decay coefficient

##########################################################
# OTHER TRAINING SETTING #
##########################################################
# num_iters_per_epoch: 200 # number of iterations per epoch
max_epoch: 500 # number of epochs
grad_clip: 1.0 # gradient clipping norm
grad_noise: false # whether to use gradient noise injection
accum_grad: 1 # gradient accumulation

batch_type: sorted
batch_size: 8

sort_in_batch: descending # how to sort data in making batch
sort_batch: descending # how to sort created batches
num_workers: 8 # number of workers of data loader
train_dtype: float32 # dtype in training
log_interval: null # log interval in iterations
keep_nbest_models: 2 # number of models to keep
num_att_plot: 3 # number of attention figures to be saved in every check
seed: 0 # random seed number
best_model_criterion:
- - valid
- loss
- min
- - train
- loss
- min
1 change: 1 addition & 0 deletions egs2/nit_song070/svs1/db.sh
55 changes: 55 additions & 0 deletions egs2/nit_song070/svs1/local/data.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
#!/usr/bin/env bash

set -e
set -u
set -o pipefail

. ./path.sh || exit 1;
. ./cmd.sh || exit 1;
. ./db.sh || exit 1;

log() {
local fname=${BASH_SOURCE[1]##*/}
echo -e "$(date '+%Y-%m-%dT%H:%M:%S') (${fname}:${BASH_LINENO[0]}:${FUNCNAME[1]}) $*"
}

SECONDS=0
stage=1
stop_stage=100
fs=24000

log "$0 $*"

. utils/parse_options.sh || exit 1;

if [ -z "${NIT_SONG070}" ]; then
log "Fill the value of 'NIT_SONG070' of db.sh"
exit 1
fi

mkdir -p ${NIT_SONG070}

train_set="train"
train_dev="dev"
eval_set="eval"

if [ ${stage} -le 0 ] && [ ${stop_stage} -ge 0 ]; then
log "stage 0: Data Download"
# The nit data should be downloaded from http://hts.sp.nitech.ac.jp/archives/2.3/HTS-demo_NIT-SONG070-F001.tar.bz2
# Terms from http://hts.sp.nitech.ac.jp/
fi

if [ ${stage} -le 1 ] && [ ${stop_stage} -ge 1 ]; then
log "stage 1: Data preparaion "

mkdir -p score_dump
mkdir -p wav_dump
python local/data_prep.py ${NIT_SONG070}/HTS-demo_NIT-SONG070-F001/data --midi_note_scp local/midi-note.scp \
--score_dump score_dump \
--wav_dumpdir wav_dump \
--sr ${fs}
for src_data in ${train_set} ${train_dev} ${eval_set}; do
utils/utt2spk_to_spk2utt.pl < data/${src_data}/utt2spk > data/${src_data}/spk2utt
utils/fix_data_dir.sh --utt_extra_files "label score.scp" data/${src_data}
done
fi