Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add JSSS recipe #217

Merged
merged 8 commits into from
Oct 7, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions egs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ Currently, the following recipes are supported.

- [LJSpeech](https://keithito.com/LJ-Speech-Dataset/): English female speaker
- [JSUT](https://sites.google.com/site/shinnosuketakamichi/publication/jsut): Japanese female speaker
- [JSSS](https://sites.google.com/site/shinnosuketakamichi/research-topics/jsss_corpus): Japanese female speaker
- [CSMSC](https://www.data-baker.com/open_source.html): Mandarin female speaker
- [CMU Arctic](http://www.festvox.org/cmu_arctic/): English speakers
- [JNAS](http://research.nii.ac.jp/src/en/JNAS.html): Japanese multi-speaker
Expand Down
91 changes: 91 additions & 0 deletions egs/jsss/voc1/cmd.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
# ====== About run.pl, queue.pl, slurm.pl, and ssh.pl ======
# Usage: <cmd>.pl [options] JOB=1:<nj> <log> <command...>
# e.g.
# run.pl --mem 4G JOB=1:10 echo.JOB.log echo JOB
#
# Options:
# --time <time>: Limit the maximum time to execute.
# --mem <mem>: Limit the maximum memory usage.
# -–max-jobs-run <njob>: Limit the number parallel jobs. This is ignored for non-array jobs.
# --num-threads <ngpu>: Specify the number of CPU core.
# --gpu <ngpu>: Specify the number of GPU devices.
# --config: Change the configuration file from default.
#
# "JOB=1:10" is used for "array jobs" and it can control the number of parallel jobs.
# The left string of "=", i.e. "JOB", is replaced by <N>(Nth job) in the command and the log file name,
# e.g. "echo JOB" is changed to "echo 3" for the 3rd job and "echo 8" for 8th job respectively.
# Note that the number must start with a positive number, so you can't use "JOB=0:10" for example.
#
# run.pl, queue.pl, slurm.pl, and ssh.pl have unified interface, not depending on its backend.
# These options are mapping to specific options for each backend and
# it is configured by "conf/queue.conf" and "conf/slurm.conf" by default.
# If jobs failed, your configuration might be wrong for your environment.
#
#
# The official documentaion for run.pl, queue.pl, slurm.pl, and ssh.pl:
# "Parallelization in Kaldi": http://kaldi-asr.org/doc/queue.html
# =========================================================~


# Select the backend used by run.sh from "local", "stdout", "sge", "slurm", or "ssh"
cmd_backend="local"

# Local machine, without any Job scheduling system
if [ "${cmd_backend}" = local ]; then

# The other usage
export train_cmd="utils/run.pl"
# Used for "*_train.py": "--gpu" is appended optionally by run.sh
export cuda_cmd="utils/run.pl"
# Used for "*_recog.py"
export decode_cmd="utils/run.pl"

# Local machine, without any Job scheduling system
elif [ "${cmd_backend}" = stdout ]; then

# The other usage
export train_cmd="utils/stdout.pl"
# Used for "*_train.py": "--gpu" is appended optionally by run.sh
export cuda_cmd="utils/stdout.pl"
# Used for "*_recog.py"
export decode_cmd="utils/stdout.pl"

# "qsub" (SGE, Torque, PBS, etc.)
elif [ "${cmd_backend}" = sge ]; then
# The default setting is written in conf/queue.conf.
# You must change "-q g.q" for the "queue" for your environment.
# To know the "queue" names, type "qhost -q"
# Note that to use "--gpu *", you have to setup "complex_value" for the system scheduler.

export train_cmd="utils/queue.pl"
export cuda_cmd="utils/queue.pl"
export decode_cmd="utils/queue.pl"

# "sbatch" (Slurm)
elif [ "${cmd_backend}" = slurm ]; then
# The default setting is written in conf/slurm.conf.
# You must change "-p cpu" and "-p gpu" for the "partion" for your environment.
# To know the "partion" names, type "sinfo".
# You can use "--gpu * " by defualt for slurm and it is interpreted as "--gres gpu:*"
# The devices are allocated exclusively using "${CUDA_VISIBLE_DEVICES}".

export train_cmd="utils/slurm.pl"
export cuda_cmd="utils/slurm.pl"
export decode_cmd="utils/slurm.pl"

elif [ "${cmd_backend}" = ssh ]; then
# You have to create ".queue/machines" to specify the host to execute jobs.
# e.g. .queue/machines
# host1
# host2
# host3
# Assuming you can login them without any password, i.e. You have to set ssh keys.

export train_cmd="utils/ssh.pl"
export cuda_cmd="utils/ssh.pl"
export decode_cmd="utils/ssh.pl"

else
echo "$0: Error: Unknown cmd_backend=${cmd_backend}" 1>&2
return 1
fi
122 changes: 122 additions & 0 deletions egs/jsss/voc1/conf/parallel_wavegan.v1.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,122 @@
# This is the hyperparameter configuration file for Parallel WaveGAN.
# Please make sure this is adjusted for the JSSS dataset. If you want to
# apply to the other dataset, you might need to carefully change some parameters.
# This configuration requires 12 GB GPU memory and takes ~3 days on RTX TITAN.

###########################################################
# FEATURE EXTRACTION SETTING #
###########################################################
sampling_rate: 24000 # Sampling rate.
fft_size: 2048 # FFT size.
hop_size: 300 # Hop size.
win_length: 1200 # Window length.
# If set to null, it will be the same as fft_size.
window: "hann" # Window function.
num_mels: 80 # Number of mel basis.
fmin: 80 # Minimum freq in mel basis calculation.
fmax: 7600 # Maximum frequency in mel basis calculation.
global_gain_scale: 1.0 # Will be multiplied to all of waveform.
trim_silence: false # Whether to trim the start and end of silence.
trim_threshold_in_db: 40 # Need to tune carefully if the recording is not good.
trim_frame_size: 2048 # Frame size in trimming.
trim_hop_size: 512 # Hop size in trimming.
format: "hdf5" # Feature file format. "npy" or "hdf5" is supported.

###########################################################
# GENERATOR NETWORK ARCHITECTURE SETTING #
###########################################################
generator_params:
in_channels: 1 # Number of input channels.
out_channels: 1 # Number of output channels.
kernel_size: 3 # Kernel size of dilated convolution.
layers: 30 # Number of residual block layers.
stacks: 3 # Number of stacks i.e., dilation cycles.
residual_channels: 64 # Number of channels in residual conv.
gate_channels: 128 # Number of channels in gated conv.
skip_channels: 64 # Number of channels in skip conv.
aux_channels: 80 # Number of channels for auxiliary feature conv.
# Must be the same as num_mels.
aux_context_window: 2 # Context window size for auxiliary feature.
# If set to 2, previous 2 and future 2 frames will be considered.
dropout: 0.0 # Dropout rate. 0.0 means no dropout applied.
use_weight_norm: true # Whether to use weight norm.
# If set to true, it will be applied to all of the conv layers.
upsample_net: "ConvInUpsampleNetwork" # Upsampling network architecture.
upsample_params: # Upsampling network parameters.
upsample_scales: [4, 5, 3, 5] # Upsampling scales. Prodcut of these must be the same as hop size.

###########################################################
# DISCRIMINATOR NETWORK ARCHITECTURE SETTING #
###########################################################
discriminator_params:
in_channels: 1 # Number of input channels.
out_channels: 1 # Number of output channels.
kernel_size: 3 # Number of output channels.
layers: 10 # Number of conv layers.
conv_channels: 64 # Number of chnn layers.
bias: true # Whether to use bias parameter in conv.
use_weight_norm: true # Whether to use weight norm.
# If set to true, it will be applied to all of the conv layers.
nonlinear_activation: "LeakyReLU" # Nonlinear function after each conv.
nonlinear_activation_params: # Nonlinear function parameters
negative_slope: 0.2 # Alpha in LeakyReLU.

###########################################################
# STFT LOSS SETTING #
###########################################################
stft_loss_params:
fft_sizes: [1024, 2048, 512] # List of FFT size for STFT-based loss.
hop_sizes: [120, 240, 50] # List of hop size for STFT-based loss
win_lengths: [600, 1200, 240] # List of window length for STFT-based loss.
window: "hann_window" # Window function for STFT-based loss

###########################################################
# ADVERSARIAL LOSS SETTING #
###########################################################
lambda_adv: 4.0 # Loss balancing coefficient.

###########################################################
# DATA LOADER SETTING #
###########################################################
batch_size: 6 # Batch size.
batch_max_steps: 25500 # Length of each audio in batch. Make sure dividable by hop_size.
pin_memory: true # Whether to pin memory in Pytorch DataLoader.
num_workers: 2 # Number of workers in Pytorch DataLoader.
remove_short_samples: true # Whether to remove samples the length of which are less than batch_max_steps.
allow_cache: true # Whether to allow cache in dataset. If true, it requires cpu memory.

###########################################################
# OPTIMIZER & SCHEDULER SETTING #
###########################################################
generator_optimizer_params:
lr: 0.0001 # Generator's learning rate.
eps: 1.0e-6 # Generator's epsilon.
weight_decay: 0.0 # Generator's weight decay coefficient.
generator_scheduler_params:
step_size: 200000 # Generator's scheduler step size.
gamma: 0.5 # Generator's scheduler gamma.
# At each step size, lr will be multiplied by this parameter.
generator_grad_norm: 10 # Generator's gradient norm.
discriminator_optimizer_params:
lr: 0.00005 # Discriminator's learning rate.
eps: 1.0e-6 # Discriminator's epsilon.
weight_decay: 0.0 # Discriminator's weight decay coefficient.
discriminator_scheduler_params:
step_size: 200000 # Discriminator's scheduler step size.
gamma: 0.5 # Discriminator's scheduler gamma.
# At each step size, lr will be multiplied by this parameter.
discriminator_grad_norm: 1 # Discriminator's gradient norm.

###########################################################
# INTERVAL SETTING #
###########################################################
discriminator_train_start_steps: 100000 # Number of steps to start to train discriminator.
train_max_steps: 400000 # Number of training steps.
save_interval_steps: 5000 # Interval steps to save checkpoint.
eval_interval_steps: 1000 # Interval steps to evaluate the network.
log_interval_steps: 100 # Interval steps to record the training log.

###########################################################
# OTHER SETTING #
###########################################################
num_save_intermediate_results: 4 # Number of results to be saved as intermediate results.
12 changes: 12 additions & 0 deletions egs/jsss/voc1/conf/slurm.conf
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# Default configuration
command sbatch --export=PATH --ntasks-per-node=1
option time=* --time $0
option mem=* --mem-per-cpu $0
option mem=0 # Do not add anything to qsub_opts
option num_threads=* --cpus-per-task $0 --ntasks-per-node=1
option num_threads=1 --cpus-per-task 1 --ntasks-per-node=1 # Do not add anything to qsub_opts
default gpu=0
option gpu=0 -p cpu
option gpu=* -p gpu --gres=gpu:$0
# note: the --max-jobs-run option is supported as a special case
# by slurm.pl and you don't have to handle it in the config file.
41 changes: 41 additions & 0 deletions egs/jsss/voc1/local/data_download.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
#!/bin/bash

# Copyright 2020 Tomoki Hayashi
# MIT License (https://opensource.org/licenses/MIT)

# Download JSSS Corpus

# shellcheck disable=SC1091
. ./path.sh || exit 1

download_dir=$1

# check arguments
if [ $# != 1 ]; then
echo "Usage: $0 <download_dir>"
exit 1
fi

set -euo pipefail

url="https://drive.google.com/a/g.sp.m.is.nagoya-u.ac.jp/uc?id=1NyiZCXkYTdYBNtD1B-IMAYCVa-0SQsKX"
if [ ! -e "${download_dir}/jsss_ver1" ]; then
utils/download_from_google_drive.sh "${url}" "${download_dir}" zip
echo "Successfully downloaded JSSS corpus."
else
echo "Already exists. Skipped."
fi

cwd=$(pwd)
if [ ! -e "${download_dir}/JSSSLabel" ]; then
echo "Downloading full-context labels for jsut v1.1..."
cd "${download_dir}"
git clone https://github.com/kan-bayashi/JSSSLabel
for name in long-form short-form simplification summarization; do
cp -vr JSSSLabel/${name} jsss_ver1
done
cd "${cwd}"
echo "Successfully downloaded JSSS label."
else
echo "Already exists. Skipped."
fi
Loading