Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add E-Branchformer configs and models in ASR recipes #4837

Merged
merged 18 commits into from
Jan 4, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
34 changes: 30 additions & 4 deletions egs2/aidatatang_200zh/asr1/README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,29 @@
<!-- Generated by scripts/utils/show_asr_result.sh -->
# RESULTS
# E-Branchformer

## Environments
- date: `Mon Dec 26 19:46:01 EST 2022`
- python version: `3.9.15 (main, Nov 24 2022, 14:31:59) [GCC 11.2.0]`
- espnet version: `espnet 202211`
- pytorch version: `pytorch 1.12.1`
- Git hash: `7a203d55543df02f0369d5608cd6f3033119a135`
- Commit date: `Fri Dec 23 00:58:49 2022 +0000`

## asr_train_asr_e_branchformer_linear1024_raw_zh_char_sp
- ASR config: [conf/tuning/train_asr_e_branchformer_linear1024.yaml](conf/tuning/train_asr_e_branchformer_linear1024.yaml)
- Params: 37.66M
- LM config: [conf/train_lm_transformer.yaml](conf/train_lm_transformer.yaml)
- Model link: [https://huggingface.co/pyf98/aidatatang_200zh_e_branchformer](https://huggingface.co/pyf98/aidatatang_200zh_e_branchformer)

### CER

|dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
|---|---|---|---|---|---|---|---|---|
|decode_asr_lm_lm_train_lm_transformer_zh_char_valid.loss.ave_asr_model_valid.acc.ave/dev|24216|234524|96.6|3.0|0.4|0.1|3.6|18.4|
|decode_asr_lm_lm_train_lm_transformer_zh_char_valid.loss.ave_asr_model_valid.acc.ave/test|48144|468933|95.9|3.6|0.4|0.2|4.2|20.8|


# Conformer

## Environments
- date: `Fri Dec 24 23:34:58 EST 2021`
- python version: `3.8.5 (default, Sep 4 2020, 07:30:14) [GCC 7.3.0]`
Expand All @@ -9,11 +33,13 @@
- Commit date: `Wed Dec 22 14:08:29 2021 -0500`

## asr_train_asr_conformer_raw_zh_char_sp
- Model link: https://huggingface.co/sw005320/aidatatang_200zh_conformer
- ASR config: [conf/train_asr_conformer.yaml](conf/train_asr_conformer.yaml)
- Params: 45.98M
- Model link: [https://huggingface.co/sw005320/aidatatang_200zh_conformer](https://huggingface.co/sw005320/aidatatang_200zh_conformer)

### CER

|dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
|---|---|---|---|---|---|---|---|---|
|decode_asr_lm_lm_train_lm_transformer_zh_char_valid.loss.ave_asr_model_valid.acc.ave/dev|24216|234524|96.6|3.0|0.5|0.1|3.6|18.5|
|decode_asr_lm_lm_train_lm_transformer_zh_char_valid.loss.ave_asr_model_valid.acc.ave/test|48144|468933|95.9|3.6|0.4|0.2|4.3|21.0|
|decode_asr_lm_lm_train_lm_transformer_zh_char_valid.loss.ave_asr_model_valid.acc.ave/test|48144|468933|95.9|3.6|0.4|0.2|4.3|21.0|
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
# network architecture
# encoder related
encoder: e_branchformer
encoder_conf:
output_size: 256
attention_heads: 4
attention_layer_type: rel_selfattn
pos_enc_layer_type: rel_pos
rel_pos_type: latest
cgmlp_linear_units: 1024
cgmlp_conv_kernel: 31
use_linear_after_conv: false
gate_activation: identity
num_blocks: 12
dropout_rate: 0.1
positional_dropout_rate: 0.1
attention_dropout_rate: 0.1
input_layer: conv2d
layer_drop_rate: 0.0
linear_units: 1024
positionwise_layer_type: linear
use_ffn: true
macaron_ffn: true
merge_conv_kernel: 31

# decoder related
decoder: transformer
decoder_conf:
attention_heads: 4
linear_units: 2048
num_blocks: 6
dropout_rate: 0.1
positional_dropout_rate: 0.1
self_attention_dropout_rate: 0.0
src_attention_dropout_rate: 0.0

# hybrid CTC/attention
model_conf:
ctc_weight: 0.3
lsm_weight: 0.1 # label smoothing option
length_normalized_loss: false

# minibatch related
batch_type: numel
batch_bins: 16000000

# optimization related
use_amp: true
num_workers: 4
accum_grad: 1
grad_clip: 5
max_epoch: 50
best_model_criterion:
- - valid
- acc
- max
keep_nbest_models: 10

optim: adam
optim_conf:
lr: 0.0005
scheduler: warmuplr
scheduler_conf:
warmup_steps: 30000

specaug: specaug
specaug_conf:
apply_time_warp: true
time_warp_window: 5
time_warp_mode: bicubic
apply_freq_mask: true
freq_mask_width_range:
- 0
- 30
num_freq_mask: 2
apply_time_mask: true
time_mask_width_range:
- 0
- 40
num_time_mask: 2
3 changes: 2 additions & 1 deletion egs2/aidatatang_200zh/asr1/run.sh
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ train_set=train
valid_set=dev
test_sets="dev test"

asr_config=conf/train_asr_conformer.yaml
asr_config=conf/train_asr_e_branchformer.yaml
inference_config=conf/decode_asr.yaml

lm_config=conf/train_lm_transformer.yaml
Expand All @@ -20,6 +20,7 @@ use_lm=true
speed_perturb_factors="0.9 1.0 1.1"

./asr.sh \
--ngpu 2 \
--lang zh \
--audio_format wav \
--feats_type raw \
Expand Down
108 changes: 106 additions & 2 deletions egs2/chime4/asr1/README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,109 @@
<!-- Generated by scripts/utils/show_asr_result.sh -->
# RESULTS
# E-Branchformer: 10 layers
## Environments
- date: `Wed Dec 28 15:49:24 EST 2022`
- python version: `3.9.15 (main, Nov 24 2022, 14:31:59) [GCC 11.2.0]`
- espnet version: `espnet 202211`
- pytorch version: `pytorch 1.12.1`
- Git hash: `f9a8009aef6ff9ba192a78c19b619ae4a9f3b9d2`
- Commit date: `Wed Dec 28 00:30:54 2022 -0500`

## asr_train_asr_e_branchformer_e10_mlp1024_linear1024_macaron_lr1e-3_warmup25k_raw_en_char_sp
- ASR config: [conf/tuning/train_asr_e_branchformer_e10_mlp1024_linear1024_macaron_lr1e-3_warmup25k.yaml](conf/tuning/train_asr_e_branchformer_e10_mlp1024_linear1024_macaron_lr1e-3_warmup25k.yaml)
- Params: 30.79M
- LM config: [conf/train_lm_transformer.yaml](conf/train_lm_transformer.yaml)
- Model link: [https://huggingface.co/pyf98/chime4_e_branchformer_e10](https://huggingface.co/pyf98/chime4_e_branchformer_e10)

### WER

|dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
|---|---|---|---|---|---|---|---|---|
|decode_asr_lm_lm_train_lm_transformer_en_char_valid.loss.ave_asr_model_valid.acc.ave/dt05_real_beamformit_5mics|1640|27119|93.7|5.0|1.2|0.6|6.8|52.5|
|decode_asr_lm_lm_train_lm_transformer_en_char_valid.loss.ave_asr_model_valid.acc.ave/dt05_simu_beamformit_5mics|1640|27120|92.4|6.1|1.6|0.7|8.4|58.2|
|decode_asr_lm_lm_train_lm_transformer_en_char_valid.loss.ave_asr_model_valid.acc.ave/et05_real_beamformit_5mics|1320|21409|90.2|8.0|1.8|1.0|10.8|60.2|
|decode_asr_lm_lm_train_lm_transformer_en_char_valid.loss.ave_asr_model_valid.acc.ave/et05_simu_beamformit_5mics|1320|21416|88.4|9.3|2.4|1.4|13.0|66.1|

### CER

|dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
|---|---|---|---|---|---|---|---|---|
|decode_asr_lm_lm_train_lm_transformer_en_char_valid.loss.ave_asr_model_valid.acc.ave/dt05_real_beamformit_5mics|1640|160390|97.4|1.3|1.3|0.7|3.3|52.5|
|decode_asr_lm_lm_train_lm_transformer_en_char_valid.loss.ave_asr_model_valid.acc.ave/dt05_simu_beamformit_5mics|1640|160400|96.6|1.8|1.7|0.9|4.3|58.2|
|decode_asr_lm_lm_train_lm_transformer_en_char_valid.loss.ave_asr_model_valid.acc.ave/et05_real_beamformit_5mics|1320|126796|95.7|2.3|2.0|1.1|5.4|60.2|
|decode_asr_lm_lm_train_lm_transformer_en_char_valid.loss.ave_asr_model_valid.acc.ave/et05_simu_beamformit_5mics|1320|126812|94.4|2.8|2.8|1.5|7.2|66.1|



# Conformer: 12 layers, 2048 linear units
## Environments
- date: `Wed Dec 28 20:41:40 EST 2022`
- python version: `3.9.15 (main, Nov 24 2022, 14:31:59) [GCC 11.2.0]`
- espnet version: `espnet 202211`
- pytorch version: `pytorch 1.12.1`
- Git hash: `ad91279f0108d54bd22abe29671b376f048822c5`
- Commit date: `Wed Dec 28 20:15:42 2022 -0500`

## asr_train_asr_conformer_e12_linear2048_raw_en_char_sp
- ASR config: [conf/tuning/train_asr_conformer_e12_linear2048.yaml](conf/tuning/train_asr_conformer_e12_linear2048.yaml)
- Params: 43.04M
- LM config: [conf/train_lm_transformer.yaml](conf/train_lm_transformer.yaml)
- Model link: [https://huggingface.co/pyf98/chime4_conformer_e12_linear2048](https://huggingface.co/pyf98/chime4_conformer_e12_linear2048)

### WER

|dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
|---|---|---|---|---|---|---|---|---|
|decode_asr_lm_lm_train_lm_transformer_en_char_valid.loss.ave_asr_model_valid.acc.ave/dt05_real_beamformit_5mics|1640|27119|93.3|5.4|1.3|0.5|7.3|55.6|
|decode_asr_lm_lm_train_lm_transformer_en_char_valid.loss.ave_asr_model_valid.acc.ave/dt05_simu_beamformit_5mics|1640|27120|91.7|6.7|1.6|0.9|9.1|62.0|
|decode_asr_lm_lm_train_lm_transformer_en_char_valid.loss.ave_asr_model_valid.acc.ave/et05_real_beamformit_5mics|1320|21409|89.2|8.9|1.9|1.1|12.0|64.5|
|decode_asr_lm_lm_train_lm_transformer_en_char_valid.loss.ave_asr_model_valid.acc.ave/et05_simu_beamformit_5mics|1320|21416|87.8|9.6|2.6|1.4|13.6|68.1|

### CER

|dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
|---|---|---|---|---|---|---|---|---|
|decode_asr_lm_lm_train_lm_transformer_en_char_valid.loss.ave_asr_model_valid.acc.ave/dt05_real_beamformit_5mics|1640|160390|97.2|1.5|1.3|0.7|3.5|55.6|
|decode_asr_lm_lm_train_lm_transformer_en_char_valid.loss.ave_asr_model_valid.acc.ave/dt05_simu_beamformit_5mics|1640|160400|96.3|2.0|1.7|1.0|4.7|62.0|
|decode_asr_lm_lm_train_lm_transformer_en_char_valid.loss.ave_asr_model_valid.acc.ave/et05_real_beamformit_5mics|1320|126796|95.1|2.8|2.1|1.2|6.1|64.6|
|decode_asr_lm_lm_train_lm_transformer_en_char_valid.loss.ave_asr_model_valid.acc.ave/et05_simu_beamformit_5mics|1320|126812|94.0|3.1|3.0|1.6|7.7|68.1|



# Conformer: 12 layers, 1024 linear units
## Environments
- date: `Wed Dec 28 15:49:24 EST 2022`
- python version: `3.9.15 (main, Nov 24 2022, 14:31:59) [GCC 11.2.0]`
- espnet version: `espnet 202211`
- pytorch version: `pytorch 1.12.1`
- Git hash: `f9a8009aef6ff9ba192a78c19b619ae4a9f3b9d2`
- Commit date: `Wed Dec 28 00:30:54 2022 -0500`

## asr_train_asr_conformer_e12_linear1024_raw_en_char_sp
- ASR config: [conf/tuning/train_asr_conformer_e12_linear1024.yaml](conf/tuning/train_asr_conformer_e12_linear1024.yaml)
- Params: 30.43M
- LM config: [conf/train_lm_transformer.yaml](conf/train_lm_transformer.yaml)
- Model link: [https://huggingface.co/pyf98/chime4_conformer_e12_linear1024](https://huggingface.co/pyf98/chime4_conformer_e12_linear1024)

### WER

|dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
|---|---|---|---|---|---|---|---|---|
|decode_asr_lm_lm_train_lm_transformer_en_char_valid.loss.ave_asr_model_valid.acc.ave/dt05_real_beamformit_5mics|1640|27119|92.8|5.8|1.5|0.6|7.8|56.5|
|decode_asr_lm_lm_train_lm_transformer_en_char_valid.loss.ave_asr_model_valid.acc.ave/dt05_simu_beamformit_5mics|1640|27120|91.3|6.7|2.0|0.8|9.5|60.5|
|decode_asr_lm_lm_train_lm_transformer_en_char_valid.loss.ave_asr_model_valid.acc.ave/et05_real_beamformit_5mics|1320|21409|88.6|9.2|2.1|1.2|12.5|63.8|
|decode_asr_lm_lm_train_lm_transformer_en_char_valid.loss.ave_asr_model_valid.acc.ave/et05_simu_beamformit_5mics|1320|21416|86.5|10.4|3.1|1.3|14.8|70.9|

### CER

|dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
|---|---|---|---|---|---|---|---|---|
|decode_asr_lm_lm_train_lm_transformer_en_char_valid.loss.ave_asr_model_valid.acc.ave/dt05_real_beamformit_5mics|1640|160390|96.9|1.6|1.5|0.7|3.8|56.5|
|decode_asr_lm_lm_train_lm_transformer_en_char_valid.loss.ave_asr_model_valid.acc.ave/dt05_simu_beamformit_5mics|1640|160400|96.0|2.0|2.0|1.0|4.9|60.5|
|decode_asr_lm_lm_train_lm_transformer_en_char_valid.loss.ave_asr_model_valid.acc.ave/et05_real_beamformit_5mics|1320|126796|94.8|2.8|2.3|1.2|6.4|63.9|
|decode_asr_lm_lm_train_lm_transformer_en_char_valid.loss.ave_asr_model_valid.acc.ave/et05_simu_beamformit_5mics|1320|126812|93.1|3.4|3.4|1.5|8.4|70.9|



# RNN, fbank_pitch
## Environments
- date: `Sun Mar 1 17:52:59 EST 2020`
- python version: `3.7.3 (default, Mar 27 2019, 22:11:17) [GCC 7.3.0]`
Expand Down
6 changes: 6 additions & 0 deletions egs2/chime4/asr1/conf/decode_asr.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
beam_size: 10
ctc_weight: 0.3
lm_weight: 1.0
penalty: 0.0
maxlenratio: 0.0
minlenratio: 0.0
1 change: 1 addition & 0 deletions egs2/chime4/asr1/conf/train_asr_e_branchformer.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
# Trained with NVIDIA V100 GPU (32GB) x 1.
encoder: conformer
encoder_conf:
output_size: 256
attention_heads: 4
linear_units: 1024
num_blocks: 12
dropout_rate: 0.1
positional_dropout_rate: 0.1
attention_dropout_rate: 0.1
input_layer: conv2d
normalize_before: true
macaron_style: true
rel_pos_type: latest
pos_enc_layer_type: rel_pos
selfattention_layer_type: rel_selfattn
activation_type: swish
use_cnn_module: true
cnn_module_kernel: 31

decoder: transformer
decoder_conf:
attention_heads: 4
linear_units: 2048
num_blocks: 6
dropout_rate: 0.1
positional_dropout_rate: 0.1
self_attention_dropout_rate: 0.1
src_attention_dropout_rate: 0.1

model_conf:
ctc_weight: 0.3
lsm_weight: 0.1
length_normalized_loss: false

frontend_conf:
n_fft: 512
win_length: 400
hop_length: 160

seed: 2022
use_amp: true
num_workers: 4
batch_type: numel
batch_bins: 15000000
accum_grad: 1
max_epoch: 60
init: none
best_model_criterion:
- - valid
- acc
- max
keep_nbest_models: 10

optim: adam
optim_conf:
lr: 0.001
weight_decay: 0.000001
scheduler: warmuplr
scheduler_conf:
warmup_steps: 25000

specaug: specaug
specaug_conf:
apply_time_warp: true
time_warp_window: 5
time_warp_mode: bicubic
apply_freq_mask: true
freq_mask_width_range:
- 0
- 27
num_freq_mask: 2
apply_time_mask: true
time_mask_width_ratio_range:
- 0.
- 0.05
num_time_mask: 2