Skip to content

Commit

Permalink
Merge pull request #4746 from wanchichen/fleurs_icassp_results
Browse files Browse the repository at this point in the history
add fleurs conformer+sc-ctc results
  • Loading branch information
sw005320 committed Nov 4, 2022
2 parents 143c946 + 3719a13 commit 86c2642
Show file tree
Hide file tree
Showing 3 changed files with 126 additions and 1 deletion.
36 changes: 36 additions & 0 deletions egs2/fleurs/asr1/README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,40 @@
<!-- Generated by scripts/utils/show_asr_result.sh -->
# Multilingual ASR - SSL + Conformer + Self-condition [XLS-R, Conformer, utt_mvn, 6500 BPE](conf/train_asr_conformer_scctc.yaml)

## Environments
- date: `Sat Oct 22 14:55:21 EDT 2022`
- python version: `3.8.6 (default, Dec 17 2020, 16:57:01) [GCC 10.2.0]`
- espnet version: `espnet 202207`
- pytorch version: `pytorch 1.8.1+cu102`
- Git hash: `e534106b837ff6cdd29977a52983c022ff1afb0f`
- Commit date: `Sun Sep 11 22:31:23 2022 -0400`
- Pretrained Model: https://huggingface.co/espnet/wanchichen_fleurs_asr_conformer_scctc

## asr_train_asr_conformer_scctc_raw_all_bpe6500_sp

### Language Identification
|dataset|Accuracy|
|---|---|
|decode_asr_lm_lm_train_lm_all_bpe6500_valid.loss.ave_asr_model_valid.acc.ave_3best/test_all|0.9541 (74237/77809)|

### WER

|dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
|---|---|---|---|---|---|---|---|---|
|decode_asr_lm_lm_train_lm_all_bpe6500_valid.loss.ave_asr_model_valid.acc.ave_3best/test_all|77809|1592160|70.5|26.1|3.4|3.4|32.9|97.0|

### CER

|dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
|---|---|---|---|---|---|---|---|---|
|decode_asr_lm_lm_train_lm_all_bpe6500_valid.loss.ave_asr_model_valid.acc.ave_3best/test_all|77809|10235271|92.2|4.7|3.1|2.6|10.4|97.0|

### TER

|dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
|---|---|---|---|---|---|---|---|---|
|decode_asr_lm_lm_train_lm_all_bpe6500_valid.loss.ave_asr_model_valid.acc.ave_3best/test_all|77809|9622352|91.3|5.6|3.1|2.7|11.4|97.0|

# Multilingual ASR - Self-supervised learning features [HuBERT_large_ll60k, Transformer, utt_mvn, 6500 BPE](conf/train_asr_hubert_large_ll60k_transformer.yaml)

## Environments
Expand Down
2 changes: 1 addition & 1 deletion egs2/fleurs/asr1/conf/train_asr.yaml
89 changes: 89 additions & 0 deletions egs2/fleurs/asr1/conf/tuning/train_asr_conformer_scctc.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
batch_type: numel
batch_bins: 12000000
accum_grad: 4
max_epoch: 10
patience: none
# The initialization method for model parameters
init: xavier_uniform
best_model_criterion:
- - valid
- acc
- max
keep_nbest_models: 3

encoder: conformer
encoder_conf:
output_size: 512
attention_heads: 8
linear_units: 2048
num_blocks: 12
dropout_rate: 0.1
positional_dropout_rate: 0.1
attention_dropout_rate: 0.1
input_layer: conv2d2
normalize_before: true
macaron_style: true
pos_enc_layer_type: "rel_pos"
selfattention_layer_type: "rel_selfattn"
activation_type: "swish"
use_cnn_module: true
cnn_module_kernel: 31
interctc_layer_idx: [3, 6, 9,]
interctc_use_conditioning: true

decoder: transformer
decoder_conf:
attention_heads: 8
linear_units: 2048
num_blocks: 6
dropout_rate: 0.1
positional_dropout_rate: 0.1
self_attention_dropout_rate: 0.1
src_attention_dropout_rate: 0.1

model_conf:
ctc_weight: 0.3
lsm_weight: 0.1
interctc_weight: 0.5
length_normalized_loss: false
extract_feats_in_collect_stats: false

optim: adam
optim_conf:
lr: 0.002
scheduler: warmuplr
scheduler_conf:
warmup_steps: 25000

specaug: specaug
specaug_conf:
apply_time_warp: true
time_warp_window: 5
time_warp_mode: bicubic
apply_freq_mask: true
freq_mask_width_range:
- 0
- 30
num_freq_mask: 2
apply_time_mask: true
time_mask_width_range:
- 0
- 40
num_time_mask: 2

freeze_param: [
"frontend.upstream"
]

frontend: s3prl
frontend_conf:
frontend_conf:
upstream: wav2vec2_url # Note: If the upstream is changed, please change the input_size in the preencoder.
path_or_url: https://huggingface.co/s3prl/converted_ckpts/resolve/main/xlsr2_300m.pt
download_dir: ./hub
multilayer_feature: True

preencoder: linear
preencoder_conf:
input_size: 1024 # Note: If the upstream is changed, please change this value accordingly.
output_size: 80

0 comments on commit 86c2642

Please sign in to comment.