Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ASR2 recipe on Tedlium3 dataset #5331

Merged
merged 13 commits into from
Oct 19, 2023
Merged
21 changes: 16 additions & 5 deletions egs2/TEMPLATE/asr2/asr2.sh
Original file line number Diff line number Diff line change
Expand Up @@ -821,7 +821,18 @@ if [ ${stage} -le 6 ] && [ ${stop_stage} -ge 6 ] && ! [[ " ${skip_stages} " =~ [
for utt_extra_file in ${utt_extra_files}; do
cp "${data_feats}/org/${dset}/${utt_extra_file}" "${data_feats}/${dset}"
done
# TODO: Maybe Remove empty text

# Remove empty text
cat "${data_feats}/org/${dset}/text.${tgt_case}.${tgt_lang}" | awk ' { if( NF != 1 ) print $0; } ' > "${data_feats}/${dset}/text.${tgt_case}.${tgt_lang}"
utils/filter_scp.pl "${data_feats}/${dset}/text.${tgt_case}.${tgt_lang}" "${data_feats}/org/${dset}/utt2spk" > "${data_feats}/${dset}/utt2spk"
utils/fix_data_dir.sh \
--utt_extra_files "${utt_extra_files}" "${data_feats}/${dset}"

# Check how many samples are removed
org_num_samples=$(wc -l "${data_feats}/org/${dset}/utt2spk" | cut -d' ' -f1)
filtered_num_samples=$(wc -l "${data_feats}/${dset}/utt2spk" | cut -d' ' -f1)
echo "filter samples with empty texts: removed $((org_num_samples - filtered_num_samples)) samples with empty text"

# TODO: Add other data cleaning -- currently being done as part of data.sh
done

Expand Down Expand Up @@ -1291,16 +1302,16 @@ if [ ${stage} -le 13 ] && [ ${stop_stage} -ge 13 ] && ! [[ " ${skip_stages} " =~
log "${_split_dir}/.done exists. Spliting is skipped"
fi

_opts+="--train_data_path_and_name_and_type ${_split_dir}/text.${tgt_case}.${tgt_lang},text,text "
_opts+="--train_data_path_and_name_and_type ${_split_dir}/text.${src_case}.${src_lang},src_text,text "
_opts+="--train_shape_file ${_split_dir}/text_shape.${tgt_token_type} "
_opts+="--train_data_path_and_name_and_type ${_split_dir}/text.${tgt_case}.${tgt_lang},text,text "
_opts+="--train_shape_file ${_split_dir}/src_text_shape.${src_token_type} "
_opts+="--train_shape_file ${_split_dir}/text_shape.${tgt_token_type} "
_opts+="--multiple_iterator true "
else
_opts+="--train_data_path_and_name_and_type ${_asr_train_dir}/text.${tgt_case}.${tgt_lang},text,text "
_opts+="--train_data_path_and_name_and_type ${_asr_train_dir}/text.${src_case}.${src_lang},src_text,text "
_opts+="--train_shape_file ${asr_stats_dir}/train/text_shape.${tgt_token_type} "
_opts+="--train_data_path_and_name_and_type ${_asr_train_dir}/text.${tgt_case}.${tgt_lang},text,text "
_opts+="--train_shape_file ${asr_stats_dir}/train/src_text_shape.${src_token_type} "
_opts+="--train_shape_file ${asr_stats_dir}/train/text_shape.${tgt_token_type} "
fi

log "Generate '${asr_exp}/run.sh'. You can resume the process from stage 13 using this script"
Expand Down
70 changes: 70 additions & 0 deletions egs2/tedlium3/asr2/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
# ASR2 recipe for Tedlium3
## Related work:
Chang, Xuankai, et al. "Exploration of Efficient End-to-End ASR using Discretized Input from Self-Supervised Learning." InterSpeech 2023.
<details>
<summary>bib info</summary>

```
@article{chang2023exploration,
title={Exploration of Efficient End-to-End ASR using Discretized Input from Self-Supervised Learning},
author={Chang, Xuankai and Yan, Brian and Fujita, Yuya and Maekaku, Takashi and Watanabe, Shinji},
journal={arXiv preprint arXiv:2305.18108},
year={2023}
}
```
</details>


# E-Branchformer ASR2 Discrete tokens with WavLM_large_Layer21_Kmeans1000_nBPE2000 (~14.5 hours for 35epochs with A5000 x 1)

## Environments
- date: `Thu Oct 19 22:11:12 JST 2023`
- python version: `3.10.8 (main, Nov 24 2022, 14:13:03) [GCC 11.2.0]`
- espnet version: `espnet 202308`
- pytorch version: `pytorch 1.13.1`
- Git hash: `7bcdab47ff7f47e55d52061e55db4128913f32b6`
- Commit date: `Thu Aug 31 20:42:18 2023 +0900`

## Model info
- Model link: https://huggingface.co/espnet/kohei0209_ted3_asr2_e_branchformer1_raw_wavlm_large_21_km1000_bpe_rm2000_bpe_ts500_sp
- ASR config: [conf/tuning/train_discrete_asr_e_branchformer1.yaml](conf/tuning/train_discrete_asr_e_branchformer1.yaml)
- Decode config: [conf/tuning/decode_ctc0.3.yaml](conf/tuning/decode_ctc0.3.yaml)


## exp/asr_train_discrete_asr_e_branchformer1_raw_wavlm_large_21_km1000_bpe_rm2000_bpe_ts500_sp/
### WER

|dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
|---|---|---|---|---|---|---|---|---|
|decode_asr_model_valid.acc.ave/test|1155|27500|94.6|3.4|2.0|3.5|8.9|79.0|

### CER

|dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
|---|---|---|---|---|---|---|---|---|
|decode_asr_model_valid.acc.ave/test|1155|145066|97.4|0.9|1.7|4.2|6.7|79.0|

### TER

|dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
|---|---|---|---|---|---|---|---|---|
|decode_asr_model_valid.acc.ave/test|1155|54206|96.1|2.2|1.7|3.8|7.7|79.0|

## exp/asr_train_discrete_asr_e_branchformer1_raw_wavlm_large_21_km1000_bpe_rm2000_bpe_ts500_sp/decode_asr_model_valid.acc.ave
### WER

|dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
|---|---|---|---|---|---|---|---|---|
|org/dev|507|17783|94.2|3.7|2.2|3.2|9.0|84.8|

### CER

|dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
|---|---|---|---|---|---|---|---|---|
|org/dev|507|95429|97.2|0.9|1.9|3.6|6.3|84.8|

### TER

|dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
|---|---|---|---|---|---|---|---|---|
|org/dev|507|36002|95.8|2.3|1.9|3.2|7.4|84.8|
1 change: 1 addition & 0 deletions egs2/tedlium3/asr2/asr2.sh
110 changes: 110 additions & 0 deletions egs2/tedlium3/asr2/cmd.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,110 @@
# ====== About run.pl, queue.pl, slurm.pl, and ssh.pl ======
# Usage: <cmd>.pl [options] JOB=1:<nj> <log> <command...>
# e.g.
# run.pl --mem 4G JOB=1:10 echo.JOB.log echo JOB
#
# Options:
# --time <time>: Limit the maximum time to execute.
# --mem <mem>: Limit the maximum memory usage.
# -–max-jobs-run <njob>: Limit the number parallel jobs. This is ignored for non-array jobs.
# --num-threads <ngpu>: Specify the number of CPU core.
# --gpu <ngpu>: Specify the number of GPU devices.
# --config: Change the configuration file from default.
#
# "JOB=1:10" is used for "array jobs" and it can control the number of parallel jobs.
# The left string of "=", i.e. "JOB", is replaced by <N>(Nth job) in the command and the log file name,
# e.g. "echo JOB" is changed to "echo 3" for the 3rd job and "echo 8" for 8th job respectively.
# Note that the number must start with a positive number, so you can't use "JOB=0:10" for example.
#
# run.pl, queue.pl, slurm.pl, and ssh.pl have unified interface, not depending on its backend.
# These options are mapping to specific options for each backend and
# it is configured by "conf/queue.conf" and "conf/slurm.conf" by default.
# If jobs failed, your configuration might be wrong for your environment.
#
#
# The official documentation for run.pl, queue.pl, slurm.pl, and ssh.pl:
# "Parallelization in Kaldi": http://kaldi-asr.org/doc/queue.html
# =========================================================~


# Select the backend used by run.sh from "local", "stdout", "sge", "slurm", or "ssh"
cmd_backend='local'

# Local machine, without any Job scheduling system
if [ "${cmd_backend}" = local ]; then

# The other usage
export train_cmd="run.pl"
# Used for "*_train.py": "--gpu" is appended optionally by run.sh
export cuda_cmd="run.pl"
# Used for "*_recog.py"
export decode_cmd="run.pl"

# Local machine logging to stdout and log file, without any Job scheduling system
elif [ "${cmd_backend}" = stdout ]; then

# The other usage
export train_cmd="stdout.pl"
# Used for "*_train.py": "--gpu" is appended optionally by run.sh
export cuda_cmd="stdout.pl"
# Used for "*_recog.py"
export decode_cmd="stdout.pl"


# "qsub" (Sun Grid Engine, or derivation of it)
elif [ "${cmd_backend}" = sge ]; then
# The default setting is written in conf/queue.conf.
# You must change "-q g.q" for the "queue" for your environment.
# To know the "queue" names, type "qhost -q"
# Note that to use "--gpu *", you have to setup "complex_value" for the system scheduler.

export train_cmd="queue.pl"
export cuda_cmd="queue.pl"
export decode_cmd="queue.pl"


# "qsub" (Torque/PBS.)
elif [ "${cmd_backend}" = pbs ]; then
# The default setting is written in conf/pbs.conf.

export train_cmd="pbs.pl"
export cuda_cmd="pbs.pl"
export decode_cmd="pbs.pl"


# "sbatch" (Slurm)
elif [ "${cmd_backend}" = slurm ]; then
# The default setting is written in conf/slurm.conf.
# You must change "-p cpu" and "-p gpu" for the "partition" for your environment.
# To know the "partion" names, type "sinfo".
# You can use "--gpu * " by default for slurm and it is interpreted as "--gres gpu:*"
# The devices are allocated exclusively using "${CUDA_VISIBLE_DEVICES}".

export train_cmd="slurm.pl"
export cuda_cmd="slurm.pl"
export decode_cmd="slurm.pl"

elif [ "${cmd_backend}" = ssh ]; then
# You have to create ".queue/machines" to specify the host to execute jobs.
# e.g. .queue/machines
# host1
# host2
# host3
# Assuming you can login them without any password, i.e. You have to set ssh keys.

export train_cmd="ssh.pl"
export cuda_cmd="ssh.pl"
export decode_cmd="ssh.pl"

# This is an example of specifying several unique options in the JHU CLSP cluster setup.
# Users can modify/add their own command options according to their cluster environments.
elif [ "${cmd_backend}" = jhu ]; then

export train_cmd="queue.pl --mem 2G"
export cuda_cmd="queue-freegpu.pl --mem 2G --gpu 1 --config conf/queue.conf"
export decode_cmd="queue.pl --mem 4G"

else
echo "$0: Error: Unknown cmd_backend=${cmd_backend}" 1>&2
return 1
fi
1 change: 1 addition & 0 deletions egs2/tedlium3/asr2/conf/decode.yaml
2 changes: 2 additions & 0 deletions egs2/tedlium3/asr2/conf/fbank.conf
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
--sample-frequency=16000
--num-mel-bins=80
11 changes: 11 additions & 0 deletions egs2/tedlium3/asr2/conf/pbs.conf
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# Default configuration
command qsub -V -v PATH -S /bin/bash
option name=* -N $0
option mem=* -l mem=$0
option mem=0 # Do not add anything to qsub_opts
option num_threads=* -l ncpus=$0
option num_threads=1 # Do not add anything to qsub_opts
option num_nodes=* -l nodes=$0:ppn=1
default gpu=0
option gpu=0
option gpu=* -l ngpus=$0
1 change: 1 addition & 0 deletions egs2/tedlium3/asr2/conf/pitch.conf
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
--sample-frequency=16000
12 changes: 12 additions & 0 deletions egs2/tedlium3/asr2/conf/queue.conf
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# Default configuration
command qsub -v PATH -cwd -S /bin/bash -j y -l arch=*64*
option name=* -N $0
option mem=* -l mem_free=$0,ram_free=$0
option mem=0 # Do not add anything to qsub_opts
option num_threads=* -pe smp $0
option num_threads=1 # Do not add anything to qsub_opts
option max_jobs_run=* -tc $0
option num_nodes=* -pe mpi $0 # You must set this PE as allocation_rule=1
default gpu=0
option gpu=0
option gpu=* -l gpu=$0 -q g.q
14 changes: 14 additions & 0 deletions egs2/tedlium3/asr2/conf/slurm.conf
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# Default configuration
command sbatch --export=PATH
option name=* --job-name $0
option time=* --time $0
option mem=* --mem-per-cpu $0
option mem=0
option num_threads=* --cpus-per-task $0
option num_threads=1 --cpus-per-task 1
option num_nodes=* --nodes $0
default gpu=0
option gpu=0 -p cpu
option gpu=* -p gpu --gres=gpu:$0 -c $0 # Recommend allocating more CPU than, or equal to the number of GPU
# note: the --max-jobs-run option is supported as a special case
# by slurm.pl and you don't have to handle it in the config file.
1 change: 1 addition & 0 deletions egs2/tedlium3/asr2/conf/train.yaml
6 changes: 6 additions & 0 deletions egs2/tedlium3/asr2/conf/tuning/decode_ctc0.3.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
beam_size: 10
ctc_weight: 0.3
lm_weight: 0.0
maxlenratio: 1.0
minlenratio: 0.0
penalty: 0.0
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
# Trained with A5000 (24 GB) x 1 GPU for Kmeans1000+nbpe2000. It takes 24 minutes per epoch.
# BPE-Dropout (https://github.com/google/sentencepiece#subword-regularization-and-bpe-dropout)
src_tokenizer_encode_conf:
enable_sampling: true # If set to true, bpe-dropout is used.
alpha: 0.4
nbest_size: -1

frontend: embed # embedding + positional encoding
frontend_conf:
embed_dim: 512
positional_dropout_rate: 0.1

specaug: specaug
specaug_conf:
apply_time_warp: false
time_warp_window: 5
time_warp_mode: bicubic
apply_freq_mask: false
freq_mask_width_range:
- 0
- 10
num_freq_mask: 0
apply_time_mask: true
time_mask_width_ratio_range:
- 0.
- 0.05
num_time_mask: 10

encoder: e_branchformer
encoder_conf:
output_size: 256
attention_heads: 4
attention_layer_type: rel_selfattn
pos_enc_layer_type: rel_pos
rel_pos_type: latest
cgmlp_linear_units: 1024
cgmlp_conv_kernel: 31
use_linear_after_conv: false
gate_activation: identity
num_blocks: 12
dropout_rate: 0.1
positional_dropout_rate: 0.1
attention_dropout_rate: 0.1
input_layer: conv1d2
layer_drop_rate: 0.0
linear_units: 1024
positionwise_layer_type: linear
use_ffn: true
macaron_ffn: true
merge_conv_kernel: 31

decoder: transformer
decoder_conf:
attention_heads: 4
linear_units: 2048
num_blocks: 6
dropout_rate: 0.1
positional_dropout_rate: 0.1
self_attention_dropout_rate: 0.1
src_attention_dropout_rate: 0.1
layer_drop_rate: 0.0

model: discrete_asr
model_conf:
ctc_weight: 0.3
lsm_weight: 0.1
length_normalized_loss: false
share_decoder_input_output_embed: false
share_encoder_decoder_input_embed: false

use_amp: true
num_att_plot: 1
log_interval: 500
num_workers: 4
batch_type: numel
batch_bins: 80000000
accum_grad: 2
max_epoch: 35
patience: none
init: none
best_model_criterion:
- - valid
- acc
- max
keep_nbest_models: 10

optim: adam
optim_conf:
lr: 0.001
weight_decay: 0.000001
scheduler: warmuplr
scheduler_conf:
warmup_steps: 10000
1 change: 1 addition & 0 deletions egs2/tedlium3/asr2/db.sh
1 change: 1 addition & 0 deletions egs2/tedlium3/asr2/local/data.sh
1 change: 1 addition & 0 deletions egs2/tedlium3/asr2/local/download_data.sh
1 change: 1 addition & 0 deletions egs2/tedlium3/asr2/local/join_suffix.py
Empty file.
1 change: 1 addition & 0 deletions egs2/tedlium3/asr2/local/prepare_data.sh
1 change: 1 addition & 0 deletions egs2/tedlium3/asr2/path.sh
1 change: 1 addition & 0 deletions egs2/tedlium3/asr2/pyscripts