New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How2 speech translation recipe #1102
Merged
Merged
Changes from 10 commits
Commits
Show all changes
19 commits
Select commit
Hold shift + click to select a range
e85a1d6
Add how2 recipe
hirofumi0810 fca89fe
Update readme
hirofumi0810 32e08b3
Update decode config
hirofumi0810 173e900
Remove fbank.conf
hirofumi0810 8360c9f
Add asr1
hirofumi0810 6383930
Remove unnecessary line
hirofumi0810 d5ab5cc
Add test set
hirofumi0810 3981e6b
Update decoding config
hirofumi0810 2ebefba
Add fbank description
hirofumi0810 38acaee
Add specaug config
hirofumi0810 d40b676
Merge remote-tracking branch 'upstream/master' into how2
hirofumi0810 69ab79a
Add score_sclite.sh
hirofumi0810 568a2e8
Add preprocess_config option
hirofumi0810 e3d1c43
Add specaug conf to asr1
hirofumi0810 a2deec2
Add copy_fbank option
hirofumi0810 5f3b536
Add comments
hirofumi0810 3c9e9e2
Add results
hirofumi0810 b85e01d
Fix results
hirofumi0810 ec7aebb
Merge remote-tracking branch 'upstream/master' into how2
hirofumi0810 File filter
Filter by extension
Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,89 @@ | ||
# ====== About run.pl, queue.pl, slurm.pl, and ssh.pl ====== | ||
# Usage: <cmd>.pl [options] JOB=1:<nj> <log> <command...> | ||
# e.g. | ||
# run.pl --mem 4G JOB=1:10 echo.JOB.log echo JOB | ||
# | ||
# Options: | ||
# --time <time>: Limit the maximum time to execute. | ||
# --mem <mem>: Limit the maximum memory usage. | ||
# -–max-jobs-run <njob>: Limit the number parallel jobs. This is ignored for non-array jobs. | ||
# --num-threads <ngpu>: Specify the number of CPU core. | ||
# --gpu <ngpu>: Specify the number of GPU devices. | ||
# --config: Change the configuration file from default. | ||
# | ||
# "JOB=1:10" is used for "array jobs" and it can control the number of parallel jobs. | ||
# The left string of "=", i.e. "JOB", is replaced by <N>(Nth job) in the command and the log file name, | ||
# e.g. "echo JOB" is changed to "echo 3" for the 3rd job and "echo 8" for 8th job respectively. | ||
# Note that the number must start with a positive number, so you can't use "JOB=0:10" for example. | ||
# | ||
# run.pl, queue.pl, slurm.pl, and ssh.pl have unified interface, not depending on its backend. | ||
# These options are mapping to specific options for each backend and | ||
# it is configured by "conf/queue.conf" and "conf/slurm.conf" by default. | ||
# If jobs failed, your configuration might be wrong for your environment. | ||
# | ||
# | ||
# The official documentaion for run.pl, queue.pl, slurm.pl, and ssh.pl: | ||
# "Parallelization in Kaldi": http://kaldi-asr.org/doc/queue.html | ||
# =========================================================~ | ||
|
||
|
||
# Select the backend used by run.sh from "local", "sge", "slurm", or "ssh" | ||
cmd_backend='local' | ||
|
||
# Local machine, without any Job scheduling system | ||
if [ "${cmd_backend}" = local ]; then | ||
|
||
# The other usage | ||
export train_cmd="run.pl" | ||
# Used for "*_train.py": "--gpu" is appended optionally by run.sh | ||
export cuda_cmd="run.pl" | ||
# Used for "*_recog.py" | ||
export decode_cmd="run.pl" | ||
|
||
# "qsub" (SGE, Torque, PBS, etc.) | ||
elif [ "${cmd_backend}" = sge ]; then | ||
# The default setting is written in conf/queue.conf. | ||
# You must change "-q g.q" for the "queue" for your environment. | ||
# To know the "queue" names, type "qhost -q" | ||
# Note that to use "--gpu *", you have to setup "complex_value" for the system scheduler. | ||
|
||
export train_cmd="queue.pl" | ||
export cuda_cmd="queue.pl" | ||
export decode_cmd="queue.pl" | ||
|
||
# "sbatch" (Slurm) | ||
elif [ "${cmd_backend}" = slurm ]; then | ||
# The default setting is written in conf/slurm.conf. | ||
# You must change "-p cpu" and "-p gpu" for the "partion" for your environment. | ||
# To know the "partion" names, type "sinfo". | ||
# You can use "--gpu * " by defualt for slurm and it is interpreted as "--gres gpu:*" | ||
# The devices are allocated exclusively using "${CUDA_VISIBLE_DEVICES}". | ||
|
||
export train_cmd="slurm.pl" | ||
export cuda_cmd="slurm.pl" | ||
export decode_cmd="slurm.pl" | ||
|
||
elif [ "${cmd_backend}" = ssh ]; then | ||
# You have to create ".queue/machines" to specify the host to execute jobs. | ||
# e.g. .queue/machines | ||
# host1 | ||
# host2 | ||
# host3 | ||
# Assuming you can login them without any password, i.e. You have to set ssh keys. | ||
|
||
export train_cmd="ssh.pl" | ||
export cuda_cmd="ssh.pl" | ||
export decode_cmd="ssh.pl" | ||
|
||
# This is an example of specifying several unique options in the JHU CLSP cluster setup. | ||
# Users can modify/add their own command options according to their cluster environments. | ||
elif [ "${cmd_backend}" = jhu ]; then | ||
|
||
export train_cmd="queue.pl --mem 2G" | ||
export cuda_cmd="queue-freegpu.pl --mem 2G --gpu 1 --config conf/gpu.conf" | ||
export decode_cmd="queue.pl --mem 4G" | ||
|
||
else | ||
echo "$0: Error: Unknown cmd_backend=${cmd_backend}" 1>&2 | ||
return 1 | ||
fi |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
tuning/decode_rnn.yaml |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
# Default configuration | ||
command qsub -v PATH -cwd -S /bin/bash -j y -l arch=*64* | ||
option mem=* -l mem_free=$0,ram_free=$0 | ||
option mem=0 # Do not add anything to qsub_opts | ||
option num_threads=* -pe smp $0 | ||
option num_threads=1 # Do not add anything to qsub_opts | ||
option max_jobs_run=* -tc $0 | ||
default gpu=0 | ||
option gpu=0 | ||
option gpu=* -l 'hostname=b1[12345678]*|c*,gpu=$0' -q g.q |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
tuning/lm.yaml |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
# Default configuration | ||
command qsub -v PATH -cwd -S /bin/bash -j y -l arch=*64* | ||
option mem=* -l mem_free=$0,ram_free=$0 | ||
option mem=0 # Do not add anything to qsub_opts | ||
option num_threads=* -pe smp $0 | ||
option num_threads=1 # Do not add anything to qsub_opts | ||
option max_jobs_run=* -tc $0 | ||
default gpu=0 | ||
option gpu=0 | ||
option gpu=* -l gpu=$0 -q g.q |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
# Default configuration | ||
command sbatch --export=PATH --ntasks-per-node=1 | ||
option time=* --time $0 | ||
option mem=* --mem-per-cpu $0 | ||
option mem=0 # Do not add anything to qsub_opts | ||
option num_threads=* --cpus-per-task $0 --ntasks-per-node=1 | ||
option num_threads=1 --cpus-per-task 1 --ntasks-per-node=1 # Do not add anything to qsub_opts | ||
default gpu=0 | ||
option gpu=0 -p cpu | ||
option gpu=* -p gpu --gres=gpu:$0 | ||
# note: the --max-jobs-run option is supported as a special case | ||
# by slurm.pl and you don't have to handle it in the config file. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
tuning/train_rnn.yaml |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
batchsize: 0 | ||
beam-size: 10 | ||
penalty: 0.0 | ||
maxlenratio: 0.0 | ||
minlenratio: 0.0 | ||
ctc-weight: 0.5 | ||
lm-weight: 0.7 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
lm-weight: 0.3 | ||
beam-size: 20 | ||
penalty: 0.0 | ||
maxlenratio: 0.0 | ||
minlenratio: 0.0 | ||
ctc-weight: 0.3 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
layer: 2 | ||
unit: 1024 | ||
opt: adam # or sgd | ||
sortagrad: 0 # Feed samples from shortest to longest ; -1: enabled for all epochs, 0: disabled, other: enabled for 'other' epochs | ||
batchsize: 256 # batch size in LM training | ||
epoch: 60 # if the data size is large, we can reduce this | ||
patience: 3 | ||
maxlen: 150 # if sentence length > lm_maxlen, lm_batchsize is automatically reduced |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,40 @@ | ||
# network architecture | ||
# encoder related | ||
elayers: 12 | ||
eunits: 2048 | ||
# decoder related | ||
dlayers: 6 | ||
dunits: 2048 | ||
# attention related | ||
adim: 256 | ||
aheads: 4 | ||
|
||
# hybrid CTC/attention | ||
mtlalpha: 0.3 | ||
|
||
# label smoothing | ||
lsm-weight: 0.1 | ||
|
||
# minibatch related | ||
batch-size: 32 | ||
maxlen-in: 512 # if input length > maxlen-in, batchsize is automatically reduced | ||
maxlen-out: 150 # if output length > maxlen-out, batchsize is automatically reduced | ||
|
||
# optimization related | ||
sortagrad: 0 # Feed samples from shortest to longest ; -1: enabled for all epochs, 0: disabled, other: enabled for 'other' epochs | ||
opt: noam | ||
accum-grad: 2 | ||
grad-clip: 5 | ||
patience: 0 | ||
epochs: 100 | ||
dropout-rate: 0.1 | ||
|
||
# transformer specific setting | ||
backend: pytorch | ||
model-module: "espnet.nets.pytorch_backend.e2e_asr_transformer:E2E" | ||
transformer-input-layer: conv2d # encoder architecture type | ||
transformer-lr: 10.0 | ||
transformer-warmup-steps: 25000 | ||
transformer-attn-dropout-rate: 0.0 | ||
transformer-length-normalized-loss: false | ||
transformer-init: pytorch |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,39 @@ | ||
# network architecture | ||
# encoder related | ||
etype: vggblstm # encoder architecture type | ||
elayers: 5 | ||
eunits: 1024 | ||
eprojs: 1024 | ||
subsample: "1_2_2_1_1" # skip every n frame from input to nth layers | ||
# decoder related | ||
dlayers: 2 | ||
dunits: 1024 | ||
context-residual: true | ||
# attention related | ||
atype: location | ||
adim: 1024 | ||
aconv-chans: 10 | ||
aconv-filts: 100 | ||
|
||
# hybrid CTC/attention | ||
mtlalpha: 0.5 | ||
|
||
# minibatch related | ||
batch-size: 20 | ||
maxlen-in: 800 # if input length > maxlen-in, batchsize is automatically reduced | ||
maxlen-out: 150 # if output length > maxlen-out, batchsize is automatically reduced | ||
|
||
# other config | ||
dropout-rate: 0.3 | ||
dropout-rate-decoder: 0.0 | ||
lsm-weight: 0.1 | ||
weight-decay: 0 | ||
|
||
# optimization related | ||
sortagrad: 0 # Feed samples from shortest to longest ; -1: enabled for all epochs, 0: disabled, other: enabled for 'other' epochs | ||
opt: adadelta | ||
epochs: 15 | ||
patience: 3 | ||
|
||
# scheduled sampling option | ||
sampling-probability: 0.0 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
../st1/local |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,30 @@ | ||
MAIN_ROOT=$PWD/../../.. | ||
KALDI_ROOT=$MAIN_ROOT/tools/kaldi | ||
|
||
[ -f $KALDI_ROOT/tools/env.sh ] && . $KALDI_ROOT/tools/env.sh | ||
export PATH=$PWD/utils/:$KALDI_ROOT/tools/openfst/bin:$KALDI_ROOT/tools/sctk/bin:$PWD:$PATH | ||
[ ! -f $KALDI_ROOT/tools/config/common_path.sh ] && echo >&2 "The standard file $KALDI_ROOT/tools/config/common_path.sh is not present -> Exit!" && exit 1 | ||
. $KALDI_ROOT/tools/config/common_path.sh | ||
export LC_ALL=C | ||
|
||
export PATH=$MAIN_ROOT/tools/moses/scripts/tokenizer/:$MAIN_ROOT/tools/moses/scripts/generic/:$PATH | ||
export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:$MAIN_ROOT/tools/chainer_ctc/ext/warp-ctc/build | ||
if [ -e $MAIN_ROOT/tools/venv/etc/profile.d/conda.sh ]; then | ||
source $MAIN_ROOT/tools/venv/etc/profile.d/conda.sh && conda deactivate && conda activate | ||
else | ||
source $MAIN_ROOT/tools/venv/bin/activate | ||
fi | ||
export PATH=$MAIN_ROOT/utils:$MAIN_ROOT/espnet/bin:$PATH | ||
|
||
export OMP_NUM_THREADS=1 | ||
|
||
# check extra module installation | ||
if ! which tokenizer.perl > /dev/null; then | ||
echo "Error: it seems that moses is not installed." >&2 | ||
echo "Error: please install moses as follows." >&2 | ||
echo "Error: cd ${MAIN_ROOT}/tools && make moses.done" >&2 | ||
return 1 | ||
fi | ||
|
||
# NOTE(kan-bayashi): Use UTF-8 in Python to avoid UnicodeDecodeError when LC_ALL=C | ||
export PYTHONIOENCODING=UTF-8 |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that the decoding configs of RNN and transformer are very different. If you don't test either of them, you can remove that config.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to keep this as it is.