-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #5350 from akreal/asr2-librispeech-100
Add discrete-token ASR for LibriSpeech 100h
- Loading branch information
Showing
19 changed files
with
364 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,56 @@ | ||
# ASR2 recipe for LibriSpeech100 | ||
## Related work: | ||
Chang, Xuankai, et al. "Exploration of Efficient End-to-End ASR using Discretized Input from Self-Supervised Learning." InterSpeech 2023. | ||
<details> | ||
<summary>bib info</summary> | ||
|
||
``` | ||
@article{chang2023exploration, | ||
title={Exploration of Efficient End-to-End ASR using Discretized Input from Self-Supervised Learning}, | ||
author={Chang, Xuankai and Yan, Brian and Fujita, Yuya and Maekaku, Takashi and Watanabe, Shinji}, | ||
journal={arXiv preprint arXiv:2305.18108}, | ||
year={2023} | ||
} | ||
``` | ||
</details> | ||
|
||
# E-Branchformer ASR2 Discrete tokens with WavLM_large_Layer21_Kmeans2000_nBPE5000 (7 hours for 70epochs with A6000 x 1) | ||
|
||
## Environments | ||
- date: `Sun Jul 23 11:19:30 CEST 2023` | ||
- python version: `3.10.8 (main, Nov 14 2022, 00:00:00) [GCC 11.3.1 20220421 (Red Hat 11.3.1-3)]` | ||
- espnet version: `espnet 202304` | ||
- pytorch version: `pytorch 1.13.1+cu117` | ||
- Git hash: `64a1cabc6e7fe4fd22d46b788cec29ba6a37801e` | ||
- Commit date: `Sat Jul 22 23:17:44 2023 +0200` | ||
- ASR config: [conf/train_discrete_asr_e_branchformer1_1gpu.yaml](conf/train_discrete_asr_e_branchformer1_1gpu.yaml) | ||
- Decode config: [conf/decode_ctc0.3.yaml](conf/decode_ctc0.3.yaml) | ||
- Pretrained model: [https://huggingface.co/espnet/akreal_ls100_asr2_e_branchformer1_1gpu_raw_wavlm_large_21_km2k_bpe_rm6k_bpe_ts5k_sp](https://huggingface.co/espnet/akreal_ls100_asr2_e_branchformer1_1gpu_raw_wavlm_large_21_km2k_bpe_rm6k_bpe_ts5k_sp) | ||
|
||
## exp/asr_train_discrete_asr_e_branchformer1_1gpu_raw_wavlm_large_21_km2000_bpe_rm6000_bpe_ts5000_sp | ||
### WER | ||
|
||
|dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err| | ||
|---|---|---|---|---|---|---|---|---| | ||
|decode_ctc0.3_asr_model_valid.acc.ave/dev_clean|2703|54402|96.5|3.3|0.2|0.4|3.8|42.5| | ||
|decode_ctc0.3_asr_model_valid.acc.ave/dev_other|2864|50948|93.9|5.7|0.4|0.5|6.7|54.4| | ||
|decode_ctc0.3_asr_model_valid.acc.ave/test_clean|2620|52576|96.5|3.3|0.2|0.4|3.9|43.2| | ||
|decode_ctc0.3_asr_model_valid.acc.ave/test_other|2939|52343|93.6|5.9|0.5|0.6|7.0|56.2| | ||
|
||
### CER | ||
|
||
|dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err| | ||
|---|---|---|---|---|---|---|---|---| | ||
|decode_ctc0.3_asr_model_valid.acc.ave/dev_clean|2703|288456|99.0|0.6|0.4|0.4|1.4|42.5| | ||
|decode_ctc0.3_asr_model_valid.acc.ave/dev_other|2864|265951|97.8|1.3|0.9|0.7|2.9|54.4| | ||
|decode_ctc0.3_asr_model_valid.acc.ave/test_clean|2620|281530|99.0|0.6|0.5|0.4|1.4|43.2| | ||
|decode_ctc0.3_asr_model_valid.acc.ave/test_other|2939|272758|97.9|1.2|0.9|0.8|2.8|56.2| | ||
|
||
### TER | ||
|
||
|dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err| | ||
|---|---|---|---|---|---|---|---|---| | ||
|decode_ctc0.3_asr_model_valid.acc.ave/dev_clean|2703|69558|94.8|3.2|2.0|0.5|5.6|42.5| | ||
|decode_ctc0.3_asr_model_valid.acc.ave/dev_other|2864|64524|91.3|5.7|3.0|1.2|9.9|54.4| | ||
|decode_ctc0.3_asr_model_valid.acc.ave/test_clean|2620|66983|94.8|3.2|2.0|0.5|5.7|43.2| | ||
|decode_ctc0.3_asr_model_valid.acc.ave/test_other|2939|66650|91.4|5.7|2.9|1.2|9.7|56.2| |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
../../TEMPLATE/asr2/asr2.sh |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,110 @@ | ||
# ====== About run.pl, queue.pl, slurm.pl, and ssh.pl ====== | ||
# Usage: <cmd>.pl [options] JOB=1:<nj> <log> <command...> | ||
# e.g. | ||
# run.pl --mem 4G JOB=1:10 echo.JOB.log echo JOB | ||
# | ||
# Options: | ||
# --time <time>: Limit the maximum time to execute. | ||
# --mem <mem>: Limit the maximum memory usage. | ||
# -–max-jobs-run <njob>: Limit the number parallel jobs. This is ignored for non-array jobs. | ||
# --num-threads <ngpu>: Specify the number of CPU core. | ||
# --gpu <ngpu>: Specify the number of GPU devices. | ||
# --config: Change the configuration file from default. | ||
# | ||
# "JOB=1:10" is used for "array jobs" and it can control the number of parallel jobs. | ||
# The left string of "=", i.e. "JOB", is replaced by <N>(Nth job) in the command and the log file name, | ||
# e.g. "echo JOB" is changed to "echo 3" for the 3rd job and "echo 8" for 8th job respectively. | ||
# Note that the number must start with a positive number, so you can't use "JOB=0:10" for example. | ||
# | ||
# run.pl, queue.pl, slurm.pl, and ssh.pl have unified interface, not depending on its backend. | ||
# These options are mapping to specific options for each backend and | ||
# it is configured by "conf/queue.conf" and "conf/slurm.conf" by default. | ||
# If jobs failed, your configuration might be wrong for your environment. | ||
# | ||
# | ||
# The official documentation for run.pl, queue.pl, slurm.pl, and ssh.pl: | ||
# "Parallelization in Kaldi": http://kaldi-asr.org/doc/queue.html | ||
# =========================================================~ | ||
|
||
|
||
# Select the backend used by run.sh from "local", "stdout", "sge", "slurm", or "ssh" | ||
cmd_backend='local' | ||
|
||
# Local machine, without any Job scheduling system | ||
if [ "${cmd_backend}" = local ]; then | ||
|
||
# The other usage | ||
export train_cmd="run.pl" | ||
# Used for "*_train.py": "--gpu" is appended optionally by run.sh | ||
export cuda_cmd="run.pl" | ||
# Used for "*_recog.py" | ||
export decode_cmd="run.pl" | ||
|
||
# Local machine logging to stdout and log file, without any Job scheduling system | ||
elif [ "${cmd_backend}" = stdout ]; then | ||
|
||
# The other usage | ||
export train_cmd="stdout.pl" | ||
# Used for "*_train.py": "--gpu" is appended optionally by run.sh | ||
export cuda_cmd="stdout.pl" | ||
# Used for "*_recog.py" | ||
export decode_cmd="stdout.pl" | ||
|
||
|
||
# "qsub" (Sun Grid Engine, or derivation of it) | ||
elif [ "${cmd_backend}" = sge ]; then | ||
# The default setting is written in conf/queue.conf. | ||
# You must change "-q g.q" for the "queue" for your environment. | ||
# To know the "queue" names, type "qhost -q" | ||
# Note that to use "--gpu *", you have to setup "complex_value" for the system scheduler. | ||
|
||
export train_cmd="queue.pl" | ||
export cuda_cmd="queue.pl" | ||
export decode_cmd="queue.pl" | ||
|
||
|
||
# "qsub" (Torque/PBS.) | ||
elif [ "${cmd_backend}" = pbs ]; then | ||
# The default setting is written in conf/pbs.conf. | ||
|
||
export train_cmd="pbs.pl" | ||
export cuda_cmd="pbs.pl" | ||
export decode_cmd="pbs.pl" | ||
|
||
|
||
# "sbatch" (Slurm) | ||
elif [ "${cmd_backend}" = slurm ]; then | ||
# The default setting is written in conf/slurm.conf. | ||
# You must change "-p cpu" and "-p gpu" for the "partition" for your environment. | ||
# To know the "partion" names, type "sinfo". | ||
# You can use "--gpu * " by default for slurm and it is interpreted as "--gres gpu:*" | ||
# The devices are allocated exclusively using "${CUDA_VISIBLE_DEVICES}". | ||
|
||
export train_cmd="slurm.pl" | ||
export cuda_cmd="slurm.pl" | ||
export decode_cmd="slurm.pl" | ||
|
||
elif [ "${cmd_backend}" = ssh ]; then | ||
# You have to create ".queue/machines" to specify the host to execute jobs. | ||
# e.g. .queue/machines | ||
# host1 | ||
# host2 | ||
# host3 | ||
# Assuming you can login them without any password, i.e. You have to set ssh keys. | ||
|
||
export train_cmd="ssh.pl" | ||
export cuda_cmd="ssh.pl" | ||
export decode_cmd="ssh.pl" | ||
|
||
# This is an example of specifying several unique options in the JHU CLSP cluster setup. | ||
# Users can modify/add their own command options according to their cluster environments. | ||
elif [ "${cmd_backend}" = jhu ]; then | ||
|
||
export train_cmd="queue.pl --mem 2G" | ||
export cuda_cmd="queue-freegpu.pl --mem 2G --gpu 1 --config conf/queue.conf" | ||
export decode_cmd="queue.pl --mem 4G" | ||
|
||
else | ||
echo "$0: Error: Unknown cmd_backend=${cmd_backend}" 1>&2 | ||
return 1 | ||
fi |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
beam_size: 20 | ||
ctc_weight: 0.3 | ||
lm_weight: 0.0 | ||
maxlenratio: 1.0 | ||
minlenratio: 0.0 | ||
penalty: 0.0 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
# Default configuration | ||
command qsub -V -v PATH -S /bin/bash | ||
option name=* -N $0 | ||
option mem=* -l mem=$0 | ||
option mem=0 # Do not add anything to qsub_opts | ||
option num_threads=* -l ncpus=$0 | ||
option num_threads=1 # Do not add anything to qsub_opts | ||
option num_nodes=* -l nodes=$0:ppn=1 | ||
default gpu=0 | ||
option gpu=0 | ||
option gpu=* -l ngpus=$0 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
# Default configuration | ||
command qsub -v PATH -cwd -S /bin/bash -j y -l arch=*64* | ||
option name=* -N $0 | ||
option mem=* -l mem_free=$0,ram_free=$0 | ||
option mem=0 # Do not add anything to qsub_opts | ||
option num_threads=* -pe smp $0 | ||
option num_threads=1 # Do not add anything to qsub_opts | ||
option max_jobs_run=* -tc $0 | ||
option num_nodes=* -pe mpi $0 # You must set this PE as allocation_rule=1 | ||
default gpu=0 | ||
option gpu=0 | ||
option gpu=* -l gpu=$0 -q g.q |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
# Default configuration | ||
command sbatch --export=PATH | ||
option name=* --job-name $0 | ||
option time=* --time $0 | ||
option mem=* --mem-per-cpu $0 | ||
option mem=0 | ||
option num_threads=* --cpus-per-task $0 | ||
option num_threads=1 --cpus-per-task 1 | ||
option num_nodes=* --nodes $0 | ||
default gpu=0 | ||
option gpu=0 -p cpu | ||
option gpu=* -p gpu --gres=gpu:$0 -c $0 # Recommend allocating more CPU than, or equal to the number of GPU | ||
# note: the --max-jobs-run option is supported as a special case | ||
# by slurm.pl and you don't have to handle it in the config file. |
93 changes: 93 additions & 0 deletions
93
egs2/librispeech_100/asr2/conf/train_discrete_asr_e_branchformer1_1gpu.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,93 @@ | ||
# Trained with A6000 (48 GB) x 1 GPUs for Kmeans2K+nbpe5K. It takes 6 minutes per epoch. | ||
# BPE-Dropout (https://github.com/google/sentencepiece#subword-regularization-and-bpe-dropout) | ||
src_tokenizer_encode_conf: | ||
enable_sampling: true # If set to true, bpe-dropout is used. | ||
alpha: 0.4 | ||
nbest_size: -1 | ||
|
||
frontend: embed # embedding + positional encoding | ||
frontend_conf: | ||
embed_dim: 512 | ||
positional_dropout_rate: 0.1 | ||
|
||
specaug: specaug | ||
specaug_conf: | ||
apply_time_warp: false | ||
time_warp_window: 5 | ||
time_warp_mode: bicubic | ||
apply_freq_mask: false | ||
freq_mask_width_range: | ||
- 0 | ||
- 10 | ||
num_freq_mask: 0 | ||
apply_time_mask: true | ||
time_mask_width_ratio_range: | ||
- 0. | ||
- 0.05 | ||
num_time_mask: 10 | ||
|
||
encoder: e_branchformer | ||
encoder_conf: | ||
output_size: 256 | ||
attention_heads: 4 | ||
attention_layer_type: rel_selfattn | ||
pos_enc_layer_type: rel_pos | ||
rel_pos_type: latest | ||
cgmlp_linear_units: 1024 | ||
cgmlp_conv_kernel: 31 | ||
use_linear_after_conv: false | ||
gate_activation: identity | ||
num_blocks: 12 | ||
dropout_rate: 0.1 | ||
positional_dropout_rate: 0.1 | ||
attention_dropout_rate: 0.1 | ||
input_layer: conv1d2 | ||
layer_drop_rate: 0.0 | ||
linear_units: 1024 | ||
positionwise_layer_type: linear | ||
use_ffn: true | ||
macaron_ffn: true | ||
merge_conv_kernel: 31 | ||
|
||
decoder: transformer | ||
decoder_conf: | ||
attention_heads: 4 | ||
linear_units: 2048 | ||
num_blocks: 6 | ||
dropout_rate: 0.1 | ||
positional_dropout_rate: 0.1 | ||
self_attention_dropout_rate: 0.1 | ||
src_attention_dropout_rate: 0.1 | ||
layer_drop_rate: 0.0 | ||
|
||
model: discrete_asr | ||
model_conf: | ||
ctc_weight: 0.3 | ||
lsm_weight: 0.1 | ||
length_normalized_loss: false | ||
share_decoder_input_output_embed: false | ||
share_encoder_decoder_input_embed: false | ||
|
||
use_amp: true | ||
num_att_plot: 1 | ||
log_interval: 1000 | ||
num_workers: 4 | ||
batch_type: numel | ||
batch_bins: 280000000 | ||
accum_grad: 2 | ||
max_epoch: 70 | ||
patience: none | ||
init: none | ||
best_model_criterion: | ||
- - valid | ||
- acc | ||
- max | ||
keep_nbest_models: 10 | ||
|
||
optim: adam | ||
optim_conf: | ||
lr: 0.006 | ||
weight_decay: 0.000001 | ||
scheduler: warmuplr | ||
scheduler_conf: | ||
warmup_steps: 15000 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
../../TEMPLATE/asr2/db.sh |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
../../asr1/local/data.sh |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
../../asr1/local/data_prep.sh |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
../../asr1/local/download_and_untar.sh |
Empty file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
../../TEMPLATE/asr2/path.sh |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
../../TEMPLATE/asr2/pyscripts |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,52 @@ | ||
#!/usr/bin/env bash | ||
# Set bash to 'debug' mode, it will exit on : | ||
# -e 'error', -u 'undefined variable', -o ... 'error in pipeline', -x 'print commands', | ||
set -e | ||
set -u | ||
set -o pipefail | ||
|
||
|
||
kmeans_feature="wavlm_large/21" # use model_type/layer_index | ||
nclusters=2000 | ||
|
||
src_lang=$(echo "${kmeans_feature}_km${nclusters}" | tr "/" "_") | ||
tgt_lang=en | ||
|
||
train_set="train_clean_100" | ||
train_dev="dev" | ||
test_sets="test_clean test_other dev_clean dev_other" | ||
|
||
asr_config=conf/train_discrete_asr_e_branchformer1_1gpu.yaml | ||
inference_config=conf/decode_ctc0.3.yaml | ||
|
||
src_nbpe=6000 # I use src_nbpe=6000 for 2000-cluster kmeans. | ||
tgt_nbpe=5000 # if token_joint is True, then only tgt_nbpe is used | ||
|
||
# ts: true sequence | ||
# rm: deduplicated sequence which removes duplicated tokens | ||
src_case="rm" | ||
tgt_case="ts" | ||
|
||
./asr2.sh \ | ||
--kmeans_opts "--batch_bins 4800000 --nj 4" \ | ||
--kmeans_feature "${kmeans_feature}" \ | ||
--nclusters "${nclusters}" \ | ||
--ngpu 1 \ | ||
--src_lang ${src_lang} \ | ||
--tgt_lang ${tgt_lang} \ | ||
--src_token_type "bpe" \ | ||
--src_nbpe $src_nbpe \ | ||
--tgt_token_type "bpe" \ | ||
--tgt_nbpe $tgt_nbpe \ | ||
--src_case ${src_case} \ | ||
--tgt_case ${tgt_case} \ | ||
--speed_perturb_factors "0.9 1.0 1.1" \ | ||
--use_lm false \ | ||
--asr_config "${asr_config}" \ | ||
--inference_config "${inference_config}" \ | ||
--train_set "${train_set}" \ | ||
--valid_set "${train_dev}" \ | ||
--test_sets "${test_sets}" \ | ||
--src_bpe_train_text "data/${train_set}/text.${src_case}.${src_lang}" \ | ||
--tgt_bpe_train_text "data/${train_set}/text.${tgt_case}.${tgt_lang}" \ | ||
--lm_train_text "data/${train_set}/text.${tgt_case}.${tgt_lang}" "$@" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
../../TEMPLATE/asr2/scripts |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
../../../tools/kaldi/egs/wsj/s5/steps |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
../../../tools/kaldi/egs/wsj/s5/utils |